Current data-centers lack in efficient support for intelligent
services, such as requirements for caching documents and cooperation
of caching servers, efficiently monitoring and managing the limited
physical resources, load-balancing, controlling overload scenarios,
that are becoming a common requirement today. On the other hand, the
System Area Network (SAN) technology is making rapid advances during
the recent years. Besides high performance, these modern interconnects
are providing a range of novel features and their support in hardware
(e.g., RDMA, atomic operations, IOAT, etc.). We propose a framework
comprising of three layers (communication protocol support,
data-center service primitives and advanced data-center services) that
work together to tackle the issues associated with existing
data-centers. The following figure shows the main components of our
framework.
Main Objectives:
Leverage the advanced features of modern interconnects to
design scalable data-centers.
Design advanced communication protocols and subsystems to boost
application performance
Provide high performance advanced system services like dynamic
resource adaptation, dynamic content caching, admission control,
etc. for large scale data-centers.
Provide high performance lower-level services primitives like
distributed locking, soft shared state, global memory aggregation,
etc. for data-centers.
Pavan Balaji, Savitha Krishnamoorthy and Jiesheng Wu
Project Sponsors:
This research is supported in part by NSF grants #CNS-0403342 and
#CNS-050942; an NSF-STTR grant with RNet; and equipment donations from
Intel, Mellanox and Dell.
Our testbed consists of a cluster of 64 nodes (8 cores each) with
proxy servers using Apache or Squid, web servers using Apache 2.0,
application servers using PHP and database servers using
MySQL/DB2. For our evaluation, we use several traces such as TPC-W,
SPECweb, auction benchmarks from Rice University such as RUBiS and
RuBBoS, worldcup traces, etc. As mentioned earlier, we work on several research directions:
Existing data-center components such as Apache, PHP, MySQL, etc., are
typically written using the sockets interface over the TCP/IP
communication protocol. The advanced communication protocols layer
aims at transparently improving the communication performance of
such applications by taking advantage of the mechanisms and features
provided by modern networks such as IBA and 10GigE. The goals of these
advanced protocols and subsystems are to maintain the sockets
semantics so that existing data-center components do not need to be
modified. In particular, we evaluate the Sockets Direct Protocol (SDP)
and propose several design alternativessuch as Zero-Copy SDP (ZSDP),
Asynchronous Zero-Copy SDP (AZ-SDP) to improve the performance of
data-center applications.
Our micro-benchmark results [ISPASS'04] show that SDP is able to provide up to 2.7
times better bandwidth as compared to the native sockets
implementation over InfiniBand (IPoIB) and significantly better
latency for large message sizes. Further, our evaluations with AZSDP [CAC'06]
stack show upto 35% improvement over ZSDP stack and upto a factor of
two bandwidth improvement as compared to SDP stack.
Results:
The figures above show the client response times and splitup of
the response time seen in a data-center environment using IPoIB and
SDP. The figure clearly shows a better performance for SDP, as
compared to IPoIB for large file transfers above 128K. To understand
the lack of performance benefits for small file sizes, we took a
similar split up of the response time perceived by the client. Though
the ``web-server time'' reduces significantly, the time taken at the
proxy is higher for SDP as compared to IPoIB. A comparison of this
splitup for SDP with IPoIB showed a significant difference in the time
for the proxy to connect to the back-end server. This high connection
time of the current SDP implementation, about 500 usecs higher than
IPoIB, makes the data-transfer related benefits of SDP imperceivable
for low file size transfers.
The advanced data-center services are intelligent services that are
critical for the efficient functioning of data-centers. For example,
requirements for caching documents, managing limited physical
resources, admission control, and prioritization and QoS mechanisms
are handled by these. In our proposed design, we utilize the novel
features of emerging technologies and provide efficient data-center
services which can lead to higher data-center throughput and lesser
response time. Specifically, the dynamic content caching service
deals with efficient and load-resilient caching techniques for
dynamically generated content, while the active resource adaptation
(used interchangeably with resource reconfiguration) service deals
with on-the-fly and scalable management and adaptation for various
system resources.
Results:
The figure on the left shows the benefits of our RDMA-based services [SAN'04]
in a data-center environment. As the number of compute threads
increases, we see a considerable degradation in the performance in the
no-cache case as well as the Socket-based implementations using IPoIB
and SDP. However, the client-polling architecture using VAPI shows no
degradation in performance due to the one-sided semantics of RDMA.
The figure on the right shows the benefits of RDMA-based resource
monitoring mechanism [RAIT'06]. Due to the highly efficient,
synchronous and accurate resource monitoring mechanism (RDMA-Sync and
e-RDMA-Sync), we observe close to 35% improvement in comparison with
traditional sockets-based implementation (Socket-Sync and
Socket-Async).
This figure shows the benefits of admission control service using the RDMA-based
resource monitoring mechanism [CCGrid'08]. Due to the accurate and efficient
resource monitoring mechanism (AC(RDMA)), we see close to 17% and 36% improvement in
response time with worldcup trace as compared to admission control using TCP/IP (AC(TCP/IP))
and system with no admission control mechanism (No AC), respectively.
The data-center service primitives and advanced data-center services
layers aim at supporting intelligent services for current
data-centers. Specifically, the data-center service primitives take advantage
of the advanced communication protocols as well as the mechanisms and
features of modern networks to provide higher-level utilities that can
be utilized by applications as well as the advanced data-center
services. For the most efficient design of the upper-level data-center
services, several primitives such as soft shared state, enhanced
point-to-point communication, distributed lock manager, and global
memory aggregator are necessary.
Results:
The graph on the left shows the application improvement seen using our
shared state primitive, namely the distributed data sharing substrate
(DDSS) [HiPC'06]. We observe that the performance of STORM is improved by around
19% for 1K, 10K and 100K record dataset sizes using DDSS in comparison
with the traditional implementation.
The graph on the right presents the basic performance improvement that
our scheme (N-CoShED) [CCGrid'07] shows over existing schemes: (i)
basic Distributed Queue based Non-shared Locking (DQNL) and (ii)
traditional Send/Receive-based Server Locking (SRSL). N-CoShED scheme
shows 39% improvement over the SRSL scheme. We also observe a
significant (up to 317% for 16 nodes) improvement over the DQNL
scheme.