Overview

Software Distributed Shared Memory (DSM) systems do not perform well because of the combined effects of increase in communication, slow networks and the large overhead associated with processing the coherence protocol. Modern interconnects like Myrinet, Quadrics and InfiniBand offer reliable, low latency, and high bandwidth. These networks also support efficient memory based communication primitives like Remote Memory Direct Access (RDMA). These supports can be leveraged to reduce overhead in a software DSM system.

Description

To efficiently implement the cache consistency protocol such as Home-based Lazy Release Consistency (HLRC) protocol, we have employed RDMA and atomic operations and presented a significant performance improvement. We also have taken on a challenge of developing a communication substrate over GM and VIA such that applications using the TreadMarks DSM package can take advantage of the enhanced communication performance of user-level protocols.

In addition to the shared memory based programming model, Remote Memory Access (RMA) operations facilitate an intermidiate programming model between message passing and shared memory. This model combines some advantages of shared memory, such as direct access to shared/global data, and the message passing model, namely the control over locality and data distribution. In the context of this model, we study latency hiding techniques: overlapping communication with computation and coalescing small put/get messages on Aggregate Remote Memory Copy Interface (ARMCI).

Modern interconnects also provide individual node the opportunity to exploit remote resources efficiently in cluster environment. In particular, using remote memory as swap area can improve application performance significantly, especially for memory intensive applications. We study the issues of using remote memory to enhance local memory hierarchy with efficient user-level protocols.

Journals (1)
1	K. Khorassani, C. Chen, B. Ramesh, A. Shafi, H. Subramoni, and DK Panda, High Performance MPI over the Slingshot Interconnect, Special Issue of Journal of Computer Science and Technology (JCST), Feb 2023.

Conferences & Workshops (2)
1	Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast A. Ruhela, B. Ramesh, S. Chakraborty, H. Subramoni, J. Hashmi, and DK Panda, Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware, Nov 2019 [Bib - Plain]
2	Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores J. Hashmi, S. Chakraborty, M. Bayatpour, H. Subramoni, and DK Panda, 32nd IEEE International Parallel & Distributed Processing Symposium (IPDPS '18), May 2018 [Bib - Plain]

NOWLAB: Network Based Computing Lab

Efficient Shared Memory on High-Speed Interconnects

Overview

Description

Journals (1)

Conferences & Workshops (2)

Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast

Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores