Design of Scalable Data-Centers over Emerging Technologies

Overview

Current data-centers lack in efficient support for intelligent services, such as requirements for caching documents and cooperation of caching servers, efficiently monitoring and managing the limited physical resources, load-balancing, controlling overload scenarios, that are becoming a common requirement today. On the other hand, the System Area Network (SAN) technology is making rapid advances during the recent years. Besides high performance, these modern interconnects are providing a range of novel features and their support in hardware (e.g., RDMA, atomic operations, IOAT, etc.). We propose a framework comprising of three layers (communication protocol support, data-center service primitives and advanced data-center services) that work together to tackle the issues associated with existing data-centers. The following figure shows the main components of our framework.

Objectives

  • Leverage the advanced features of modern interconnects to design scalable data-centers.
  • Design advanced communication protocols and subsystems to boost application performance
  • Provide high performance advanced system services like dynamic resource adaptation, dynamic content caching, admission control, etc. for large scale data-centers.
  • Provide high performance lower-level services primitives like distributed locking, soft shared state, global memory aggregation, etc. for data-centers.

Description

framwork

On-going Work

Currently, we are also extending our research focus in several directions:

  • Leverage the features of InfiniBand WAN and iWARP-capable interconnects such as 10-GigE to design scalable geographically distributed data-centers.
  • Design efficient service primitives using several technologies such as on-chip DMA, multi-core architectures to enhance the data-center performance.
  • Provide advanced data-center services such as admission control, QoS by utilizing the advanced service primitives for large-scale data-centers.

Results

Our testbed consists of a cluster of 64 nodes (8 cores each) with proxy servers using Apache or Squid, web servers using Apache 2.0, application servers using PHP and database servers using MySQL/DB2. For our evaluation, we use several traces such as TPC-W, SPECweb, auction benchmarks from Rice University such as RUBiS and RuBBoS, worldcup traces, etc. As mentioned earlier, we work on several research directions: Advanced Communication Protocols and Subsystems for Data-Centers

Existing data-center components such as Apache, PHP, MySQL, etc., are typically written using the sockets interface over the TCP/IP communication protocol. The advanced communication protocols layer aims at transparently improving the communication performance of such applications by taking advantage of the mechanisms and features provided by modern networks such as IBA and 10GigE. The goals of these advanced protocols and subsystems are to maintain the sockets semantics so that existing data-center components do not need to be modified. In particular, we evaluate the Sockets Direct Protocol (SDP) and propose several design alternativessuch as Zero-Copy SDP (ZSDP), Asynchronous Zero-Copy SDP (AZ-SDP) to improve the performance of data-center applications. Our micro-benchmark results [ISPASS'04] show that SDP is able to provide up to 2.7 times better bandwidth as compared to the native sockets implementation over InfiniBand (IPoIB) and significantly better latency for large message sizes. Further, our evaluations with AZSDP [CAC'06] stack show upto 35% improvement over ZSDP stack and upto a factor of two bandwidth improvement as compared to SDP stack.

Results:

sdp client sdp prox

The figures above show the client response times and splitup of the response time seen in a data-center environment using IPoIB and SDP. The figure clearly shows a better performance for SDP, as compared to IPoIB for large file transfers above 128K. To understand the lack of performance benefits for small file sizes, we took a similar split up of the response time perceived by the client. Though the ``web-server time'' reduces significantly, the time taken at the proxy is higher for SDP as compared to IPoIB. A comparison of this splitup for SDP with IPoIB showed a significant difference in the time for the proxy to connect to the back-end server. This high connection time of the current SDP implementation, about 500 usecs higher than IPoIB, makes the data-transfer related benefits of SDP imperceivable for low file size transfers.

Advanced System Services for Emerging Data-Centers

The advanced data-center services are intelligent services that are critical for the efficient functioning of data-centers. For example, requirements for caching documents, managing limited physical resources, admission control, and prioritization and QoS mechanisms are handled by these. In our proposed design, we utilize the novel features of emerging technologies and provide efficient data-center services which can lead to higher data-center throughput and lesser response time. Specifically, the dynamic content caching service deals with efficient and load-resilient caching techniques for dynamically generated content, while the active resource adaptation (used interchangeably with resource reconfiguration) service deals with on-the-fly and scalable management and adaptation for various system resources.

Results:

throughput zipf

The figure on the left shows the benefits of our RDMA-based services [SAN'04] in a data-center environment. As the number of compute threads increases, we see a considerable degradation in the performance in the no-cache case as well as the Socket-based implementations using IPoIB and SDP. However, the client-polling architecture using VAPI shows no degradation in performance due to the one-sided semantics of RDMA. The figure on the right shows the benefits of RDMA-based resource monitoring mechanism [RAIT'06]. Due to the highly efficient, synchronous and accurate resource monitoring mechanism (RDMA-Sync and e-RDMA-Sync), we observe close to 35% improvement in comparison with traditional sockets-based implementation (Socket-Sync and Socket-Async).

admission control world cup

This figure shows the benefits of admission control service using the RDMA-based resource monitoring mechanism [CCGrid'08]. Due to the accurate and efficient resource monitoring mechanism (AC(RDMA)), we see close to 17% and 36% improvement in response time with worldcup trace as compared to admission control using TCP/IP (AC(TCP/IP)) and system with no admission control mechanism (No AC), respectively.

Lower-Level Service Primitives for Emerging Data-Centers

The data-center service primitives and advanced data-center services layers aim at supporting intelligent services for current data-centers. Specifically, the data-center service primitives take advantage of the advanced communication protocols as well as the mechanisms and features of modern networks to provide higher-level utilities that can be utilized by applications as well as the advanced data-center services. For the most efficient design of the upper-level data-center services, several primitives such as soft shared state, enhanced point-to-point communication, distributed lock manager, and global memory aggregator are necessary.

Results:

shred state storm dlm-sh

The graph on the left shows the application improvement seen using our shared state primitive, namely the distributed data sharing substrate (DDSS) [HiPC'06]. We observe that the performance of STORM is improved by around 19% for 1K, 10K and 100K record dataset sizes using DDSS in comparison with the traditional implementation. The graph on the right presents the basic performance improvement that our scheme (N-CoShED) [CCGrid'07] shows over existing schemes: (i) basic Distributed Queue based Non-shared Locking (DQNL) and (ii) traditional Send/Receive-based Server Locking (SRSL). N-CoShED scheme shows 39% improvement over the SRSL scheme. We also observe a significant (up to 317% for 16 nodes) improvement over the DQNL scheme.

Conferences & Workshops (22)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22

Technical Reports (4)

1 K. Vaidyanathan, P. Lai, S. Narravula, and DK Panda, Benefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems, OSU-CISRC-8/07-TR53
2 K. Vaidyanathan, H. Jin, S. Narravula, and DK Panda, Accurate Load Monitoring for Cluster-based Web Data-Centers over RDMA-enabled Networks OSU-CISRC-7/05-TR49
3 K. Vaidyanathan, P. Balaji, J. Wu, H. Jin, and DK Panda, An Architectural Study of Cluster-Based Multi-Tier Data-Centers,
4 S. Krishnamoorthy, P. Balaji, K. Vaidyanathan, H. Jin, and DK Panda, Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier Data-Centers over InfiniBand,

Ph.D. Disserations (3)

1 S. Narravula, Designing High-Performance and Scalable Distributed Datacenter Services over Modern Interconnects, Aug 2008
2 K. Vaidyanathan, High Performance and Scalable Soft Shared State for Next-Generation Datacenters, May 2008
3 P. Balaji, High Performance Communication Support for Sockets Based Applications over High-Speed Networks, Jun 2006

M.S. Thesis (1)

1 S. Krishnamoorthy, Dynamic Re-Configurability Support to Provide Soft QoS Guarantees in Cluster-Based Multi-Tier Data-Centers over InfiniBand, Jun 2004

Sponsors

NSF
Intel
Mellanox
Dell