Design of Scalable Data-Centers over Emerging Technologies

Overview

Current data-centers lack in efficient support for intelligent services, such as requirements for caching documents and cooperation of caching servers, efficiently monitoring and managing the limited physical resources, load-balancing, controlling overload scenarios, that are becoming a common requirement today. On the other hand, the System Area Network (SAN) technology is making rapid advances during the recent years. Besides high performance, these modern interconnects are providing a range of novel features and their support in hardware (e.g., RDMA, atomic operations, IOAT, etc.). We propose a framework comprising of three layers (communication protocol support, data-center service primitives and advanced data-center services) that work together to tackle the issues associated with existing data-centers. The following figure shows the main components of our framework.

Objectives

Leverage the advanced features of modern interconnects to design scalable data-centers.
Design advanced communication protocols and subsystems to boost application performance
Provide high performance advanced system services like dynamic resource adaptation, dynamic content caching, admission control, etc. for large scale data-centers.
Provide high performance lower-level services primitives like distributed locking, soft shared state, global memory aggregation, etc. for data-centers.

Description

On-going Work

Currently, we are also extending our research focus in several directions:

Leverage the features of InfiniBand WAN and iWARP-capable interconnects such as 10-GigE to design scalable geographically distributed data-centers.
Design efficient service primitives using several technologies such as on-chip DMA, multi-core architectures to enhance the data-center performance.
Provide advanced data-center services such as admission control, QoS by utilizing the advanced service primitives for large-scale data-centers.

Results

Our testbed consists of a cluster of 64 nodes (8 cores each) with proxy servers using Apache or Squid, web servers using Apache 2.0, application servers using PHP and database servers using MySQL/DB2. For our evaluation, we use several traces such as TPC-W, SPECweb, auction benchmarks from Rice University such as RUBiS and RuBBoS, worldcup traces, etc. As mentioned earlier, we work on several research directions:

Advanced Communication Protocols and Subsystems for Data-Centers
Advanced System Services for Emerging Data-Centers
Lower-Level Service Primitives for Emerging Data-Centers
On-going Work

Advanced Communication Protocols and Subsystems for Data-Centers

Existing data-center components such as Apache, PHP, MySQL, etc., are typically written using the sockets interface over the TCP/IP communication protocol. The advanced communication protocols layer aims at transparently improving the communication performance of such applications by taking advantage of the mechanisms and features provided by modern networks such as IBA and 10GigE. The goals of these advanced protocols and subsystems are to maintain the sockets semantics so that existing data-center components do not need to be modified. In particular, we evaluate the Sockets Direct Protocol (SDP) and propose several design alternativessuch as Zero-Copy SDP (ZSDP), Asynchronous Zero-Copy SDP (AZ-SDP) to improve the performance of data-center applications. Our micro-benchmark results [ISPASS'04] show that SDP is able to provide up to 2.7 times better bandwidth as compared to the native sockets implementation over InfiniBand (IPoIB) and significantly better latency for large message sizes. Further, our evaluations with AZSDP [CAC'06] stack show upto 35% improvement over ZSDP stack and upto a factor of two bandwidth improvement as compared to SDP stack.

Results:

The figures above show the client response times and splitup of the response time seen in a data-center environment using IPoIB and SDP. The figure clearly shows a better performance for SDP, as compared to IPoIB for large file transfers above 128K. To understand the lack of performance benefits for small file sizes, we took a similar split up of the response time perceived by the client. Though the ``web-server time'' reduces significantly, the time taken at the proxy is higher for SDP as compared to IPoIB. A comparison of this splitup for SDP with IPoIB showed a significant difference in the time for the proxy to connect to the back-end server. This high connection time of the current SDP implementation, about 500 usecs higher than IPoIB, makes the data-transfer related benefits of SDP imperceivable for low file size transfers.

Advanced System Services for Emerging Data-Centers

The advanced data-center services are intelligent services that are critical for the efficient functioning of data-centers. For example, requirements for caching documents, managing limited physical resources, admission control, and prioritization and QoS mechanisms are handled by these. In our proposed design, we utilize the novel features of emerging technologies and provide efficient data-center services which can lead to higher data-center throughput and lesser response time. Specifically, the dynamic content caching service deals with efficient and load-resilient caching techniques for dynamically generated content, while the active resource adaptation (used interchangeably with resource reconfiguration) service deals with on-the-fly and scalable management and adaptation for various system resources.

Results:

The figure on the left shows the benefits of our RDMA-based services [SAN'04] in a data-center environment. As the number of compute threads increases, we see a considerable degradation in the performance in the no-cache case as well as the Socket-based implementations using IPoIB and SDP. However, the client-polling architecture using VAPI shows no degradation in performance due to the one-sided semantics of RDMA. The figure on the right shows the benefits of RDMA-based resource monitoring mechanism [RAIT'06]. Due to the highly efficient, synchronous and accurate resource monitoring mechanism (RDMA-Sync and e-RDMA-Sync), we observe close to 35% improvement in comparison with traditional sockets-based implementation (Socket-Sync and Socket-Async).

This figure shows the benefits of admission control service using the RDMA-based resource monitoring mechanism [CCGrid'08]. Due to the accurate and efficient resource monitoring mechanism (AC(RDMA)), we see close to 17% and 36% improvement in response time with worldcup trace as compared to admission control using TCP/IP (AC(TCP/IP)) and system with no admission control mechanism (No AC), respectively.

Lower-Level Service Primitives for Emerging Data-Centers

The data-center service primitives and advanced data-center services layers aim at supporting intelligent services for current data-centers. Specifically, the data-center service primitives take advantage of the advanced communication protocols as well as the mechanisms and features of modern networks to provide higher-level utilities that can be utilized by applications as well as the advanced data-center services. For the most efficient design of the upper-level data-center services, several primitives such as soft shared state, enhanced point-to-point communication, distributed lock manager, and global memory aggregator are necessary.

Results:

The graph on the left shows the application improvement seen using our shared state primitive, namely the distributed data sharing substrate (DDSS) [HiPC'06]. We observe that the performance of STORM is improved by around 19% for 1K, 10K and 100K record dataset sizes using DDSS in comparison with the traditional implementation. The graph on the right presents the basic performance improvement that our scheme (N-CoShED) [CCGrid'07] shows over existing schemes: (i) basic Distributed Queue based Non-shared Locking (DQNL) and (ii) traditional Send/Receive-based Server Locking (SRSL). N-CoShED scheme shows 39% improvement over the SRSL scheme. We also observe a significant (up to 317% for 16 nodes) improvement over the DQNL scheme.

Conferences & Workshops (22)
1	Accelerating TensorFlow with Adaptive RDMA-based gRPC R. Biswas, X. Lu, and DK Panda, 25th IEEE International Conference on High Performance Computing, Data, and Analytics, Dec 2018 [Bib - Plain]
2	Characterizing and Accelerating Indexing Techniques on Distributed Ordered Tables S. Gugnani, X. Lu, H. Qi, L. Zha, and DK Panda, 2017 IEEE International Conference on Big Data (IEEE Big Data 2017), Dec 2017 [Bib - Plain]
3	Optimized Distributed Data Sharing Substrate in Multi-Core Commodity Clusters: A Comprehensive Study with Applications K. Vaidyanathan, and S. Narravula, CCGrid '08, May 2008 [Slides] [Bib - Plain]
4	Advanced RDMA-based Admission Control for Modern Data-Centers P. Lai, S. Narravula, K. Vaidyanathan, and DK Panda, CCGrid '08, May 2008 [Slides] [Bib - Plain]
5	Efficient Asynchronous Memory Copy Operations on Multi-Core Systems and I/OAT K. Vaidyanathan, L. Chai, W. Huang, and DK Panda, IEEE International Conference on Cluster Computing 2007, Sep 2007 [Bib - Plain]
6	High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations S. Narravula, A. Mamidala, A. Vishnu, K. Vaidyanathan, and DK Panda, International Sympsoium on Cluster Computing and the Grid (CCGrid 2007), May 2007 [Slides] [Bib - Plain]
7	Benefits of I/O Acceleration Technology (I/OAT) in Clusters K. Vaidyanathan, and DK Panda, International Symposium on Performance Analysis of Systems and Software (ISPASS), Apr 2007 [Bib - Plain]
8	Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji, and DK Panda, Workshop on NSF Next Generation Software(NGS) Program; held in conjunction with IPDPS, Apr 2007 [Bib - Plain]
9	Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT K. Vaidyanathan, W. Huang, L. Chai, and DK Panda, International Workshop on Communication Architecture for Clusters (CAC), Mar 2007 [Bib - Plain]
10	DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects K. Vaidyanathan, S. Narravula, and DK Panda, International Conference on High Performance Computing (HiPC), Dec 2006 [Slides] [Bib - Plain]
11	NemC: A Network Emulator for Cluster-of-Clusters H. Jin, S. Narravula, K. Vaidyanathan, and DK Panda, International Conf. on Computer Commn. and Networks, Oct 2006 [Bib - Plain]
12	Exploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers K. Vaidyanathan, H. Jin, and DK Panda, Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies, Sep 2006 [Bib - Plain]
13	Designing Efficient Cooperative Caching Schemes for Multi-Tier Data-Centers over RDMA-enabled Networks S. Narravula, H. Jin, K. Vaidyanathan, and DK Panda, International Symposium on Cluster Computing and the Grid (CCGrid 2006), May 2006 [Bib - Plain]
14	Designing Next-Generation Data-Centers with Advanced Communication Protocols and Systems Services P. Balaji, K. Vaidyanathan, S. Narravula, H. Jin, and DK Panda, Workshop on NSF Next Generation Software(NGS) Program; held in conjuction with IPDPS, Apr 2006 [Slides] [Bib - Plain]
15	Asynchronous Zero-Copy Communication for Synchronous Sockets Direct Protocol (SDP) over InfiniBand P. Balaji, S. Bhagvat, H. Jin, and DK Panda, Communication Architecture for Clusters (CAC) Workshop, Apr 2006 [Bib - Plain]
16	Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H. Jin, S. Narravula, K. Vaidyanathan, P. Balaji, and DK Panda, Workshop on High Performance Interconnects for Distributed Computing (HPI-DC); In conjunction with HPDC-14, Jul 2005 [Bib - Plain]
17	Architecture for Caching Responses with Multiple Dynamic Dependencies in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, H. Jin, and DK Panda, IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 05), May 2005 [Slides] [Bib - Plain]
18	On the Provision of Prioritization and Soft QoS in Dynamically Reconfigurable Shared Data-Centers over InfiniBand P. Balaji, S. Narravula, K. Vaidyanathan, H. Jin, and DK Panda, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 05), Mar 2005 [Slides] [Bib - Plain]
19	Workload-driven Analysis of File Systems in Shared Multi-Tier Data-Centers over InfiniBand K. Vaidyanathan, P. Balaji, H. Jin, and DK Panda, Computer Architecture Evaluation using Commercial Workloads (in conjunction with HPCA), Feb 2005 [Slides] [Bib - Plain]
20	Exploiting Remote Memory Operations to Design Efficient Reconfiguration for Shared Data-Centers over InfiniBand P. Balaji, K. Vaidyanathan, S. Narravula, K. Savitha, H. Jin, and DK Panda, Workshop on Remote Direct Memory Access (RDMA): Applications, Implementations, and Technologies in conjunction with the IEEE Cluster, Sep 2004 [Slides] [Bib - Plain]
21	Sockets Direct Procotol over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu, and DK Panda, IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS 04), Apr 2004 [Slides] [Bib - Plain]
22	Supporting Strong Coherency for Active Caches in Multi-Tier Data-Centers over InfiniBand S. Narravula, P. Balaji, K. Vaidyanathan, S. Krishnamoorthy, J. Wu, and DK Panda, SAN-03 Workshop (in conjunction with HPCA), Feb 2004 [Slides] [Bib - Plain]

Technical Reports (4)
1	K. Vaidyanathan, P. Lai, S. Narravula, and DK Panda, Benefits of Dedicating Resource Sharing Services in Data-Centers for Emerging Multi-Core Systems, OSU-CISRC-8/07-TR53
2	K. Vaidyanathan, H. Jin, S. Narravula, and DK Panda, Accurate Load Monitoring for Cluster-based Web Data-Centers over RDMA-enabled Networks OSU-CISRC-7/05-TR49
3	K. Vaidyanathan, P. Balaji, J. Wu, H. Jin, and DK Panda, An Architectural Study of Cluster-Based Multi-Tier Data-Centers,
4	S. Krishnamoorthy, P. Balaji, K. Vaidyanathan, H. Jin, and DK Panda, Dynamic Reconfigurability Support for providing Soft QoS Guarantees in Cluster-based Multi-Tier Data-Centers over InfiniBand,

Ph.D. Disserations (3)
1	S. Narravula, Designing High-Performance and Scalable Distributed Datacenter Services over Modern Interconnects, Aug 2008
2	K. Vaidyanathan, High Performance and Scalable Soft Shared State for Next-Generation Datacenters, May 2008
3	P. Balaji, High Performance Communication Support for Sockets Based Applications over High-Speed Networks, Jun 2006

M.S. Thesis (1)
1	S. Krishnamoorthy, Dynamic Re-Configurability Support to Provide Soft QoS Guarantees in Cluster-Based Multi-Tier Data-Centers over InfiniBand, Jun 2004

NOWLAB: Network Based Computing Lab