Efficient Shared Memory on High-Speed Interconnects


Software Distributed Shared Memory (DSM) systems do not perform well because of the combined effects of increase in communication, slow networks and the large overhead associated with processing the coherence protocol. Modern interconnects like Myrinet, Quadrics and InfiniBand offer reliable, low latency, and high bandwidth. These networks also support efficient memory based communication primitives like Remote Memory Direct Access (RDMA). These supports can be leveraged to reduce overhead in a software DSM system.


To efficiently implement the cache consistency protocol such as Home-based Lazy Release Consistency (HLRC) protocol, we have employed RDMA and atomic operations and presented a significant performance improvement. We also have taken on a challenge of developing a communication substrate over GM and VIA such that applications using the TreadMarks DSM package can take advantage of the enhanced communication performance of user-level protocols.

In addition to the shared memory based programming model, Remote Memory Access (RMA) operations facilitate an intermidiate programming model between message passing and shared memory. This model combines some advantages of shared memory, such as direct access to shared/global data, and the message passing model, namely the control over locality and data distribution. In the context of this model, we study latency hiding techniques: overlapping communication with computation and coalescing small put/get messages on Aggregate Remote Memory Copy Interface (ARMCI).

Modern interconnects also provide individual node the opportunity to exploit remote resources efficiently in cluster environment. In particular, using remote memory as swap area can improve application performance significantly, especially for memory intensive applications. We study the issues of using remote memory to enhance local memory hierarchy with efficient user-level protocols.