High Performance Runtime for PGAS Models (OpenSHMEM, UPC and CAF)

Overview

Partitioned Global Address Space (PGAS) languages are growing in popularity because of their ability to provide shared memory programming model over distributed memory machines. In this model, data can be stored in global arrays and manipulated by individual compute threads. This model shows promise for expressing algorithms that have irregular computation and communication patterns. It is unlikely that such applications written in MPI will be re-written using the emerging PGAS languages in the near future. But, it is more likely that parts of these applications will be converted using newer models. This requires that underlying implementation of system software be able to support multiple programming models simultaneously.

Objectives

In this research work, we propose a Unified Runtime for supporting multiple programming models, Currently our high performance runtime provides support for MPI and UPC (Unified Parallel C) languages. Our high performance runtime provides built-in support for load balancing and work-stealing using multi-end point design. We also proposed features like 'UPC Queues' for expressing irregular applications in UPC, in a high performance manner.

Software Distribution

Link to High Performance Runtime for PGAS Models (OpenSHMEM, UPC and CAF)

Journals (2)
1	DK Panda, H. Subramoni, C. Chu, and M. Bayatpour, The MVAPICH project: Transforming Research into High-Performance MPI Library for HPC Community , Journal of Computational Science (JOCS), Special Issue on Translational Computer Science, Oct 2020.
2	K. Hamidouche, A. Venkatesh, Ammar Awan, H. Subramoni, and DK Panda, CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters, ParCo: Elsevier Parallel Computing Journal ,

Conferences & Workshops (23)
1	Exploring Hybrid MPI+Kokkos Tasks Programming Model Samuel Khuvis, K. Tomko, J. Hashmi, and DK Panda, The 3rd Annual Parallel Applications Workshop, Alternatives to MPI+X (PAW-ATM), Nov 2020 [held in conjunction with SC’20] [Bib - Plain]
2	Performance of PGAS Models on KNL: A Comprehensive Study with MVAPICH2-X J. Hashmi, M. Li, H. Subramoni, and DK Panda, Intel Xeon Phi User's Group (IXPUG) 2017, Sep 2017 [Bib - Plain]
3	Exploiting and Evaluating OpenSHMEM on KNL Architecture J. Hashmi, M. Li, H. Subramoni, and DK Panda, Fourth Workshop on OpenSHMEM and Related Technologies, Aug 2017 [Bib - Plain]
4	Enabling Performance Efficient Runtime Support for Hybrid MPI+UPC++ Programming Models J. Hashmi, K. Hamidouche, and DK Panda, 18th IEEE International Conference on High Performance Computing and Communications (HPCC'16), Dec 2016 [Bib - Plain]
5	OpenSHMEM NonBlocking Data Movement Operations with MVAPICH2-X: Early Experiences K. Hamidouche, J. Zhang, K. Tomko, and DK Panda, PGAS Applications Workshop, Nov 2016 [Bib - Plain]
6	High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR M. Li, K. Hamidouche, X. Lu, J. Zhang, J. Lin, and DK Panda, HiPC '15, Dec 2015 [Bib - Plain]
7	A Case for Non-Blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X Ammar Awan, K. Hamidouche, C. Chu, and DK Panda, OpenSHMEM 2015 for PGAS Programming in the Exascale Era, Aug 2015 [Bib - Plain]
8	Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM J. Lin, K. Hamidouche, J. Zhang, X. Lu, A. Vishnu, and DK Panda, OpenSHMEM 2015 for PGAS Programming in the Exascale Era, Aug 2015 [Bib - Plain]
9	High-Performance Coarray Fortran Support with MVAPICH2-X: Initial Experience and Evaluation J. Lin, K. Hamidouche, X. Lu, M. Li, and DK Panda, HIPS '15 (IPDPS Workshop), May 2015 [Bib - Plain]
10	On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI S. Chakraborty, H. Subramoni, J. Perkins, Ammar Awan, and DK Panda, HIPS '15 (IPDPS Workshop), May 2015 [Bib - Plain]
11	Scalable MiniMD Design with Hybrid MPI and OpenSHMEM M. Li, J. Lin, X. Lu, K. Hamidouche, K. Tomko, and DK Panda, OUG '14 (Co-located with PGAS), Oct 2014 [Bib - Plain]
12	Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models J. Jose, S. Potluri, H. Subramoni, X. Lu, K. Hamidouche, K. Schulz, H. Sundar, and DK Panda, International Conference on Partitioned Global Address Space Programming Models (PGAS '14), Oct 2014 [Bib - Plain]
13	High Performance OpenSHMEM for MIC Clusters: Extensions, Runtime Designs, and Application Co-Design J. Jose, K. Hamidouche, X. Lu, S. Potluri, J. Zhang, K. Tomko, and DK Panda, IEEE CLUSTER’14, Sep 2014 [Bib - Plain]
14	A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters J. Jose, J. Zhang, A. Venkatesh, S. Potluri, and DK Panda, OpenSHMEM Workshop, Mar 2014 [Bib - Plain]
15	UPC on MIC: Early Experiences with Native and Symmetric Modes M. Luo, M. Li, A. Venkatesh, X. Lu, and DK Panda, International Conference on Partitioned Global Address Space Programming Models (PGAS '13), Oct 2013 [Bib - Plain]
16	Optimizing Collective Communication in OpenSHMEM J. Jose, K. Kandalla, S. Potluri, J. Zhang, and DK Panda, International Conference on Partitioned Global Address Space Programming Models (PGAS '13), Oct 2013 [Bib - Plain]
17	Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models J. Jose, S. Potluri, K. Tomko, and DK Panda, International Supercomputing Conference (ISC '13), Jun 2013 [Slides] [Bib - Plain]
18	Extending OpenSHMEM for GPU Computing S. Potluri, D. Bureddy, H. Wang, H. Subramoni, and DK Panda, International Parallel and Distributed Processing Symposium (IPDPS '13), May 2013 [Slides] [Bib - Plain]
19	Multi-Threaded UPC Runtime for GPU to GPU communication over InfiniBand M. Luo, H. Wang, and DK Panda, International Conference on Partitioned Global Address Space Programming Models (PGAS '12), Oct 2012 [Slides] [Bib - Plain]
20	Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation J. Jose, K. Kandalla, M. Luo, and DK Panda, International Conference on Parallel Processing (ICPP '12), Sep 2012 [Bib - Plain]
21	Multi-threaded UPC Runtime with Network Endpoints: Design Alternatives and Evaluation on Multi-core Architectures M. Luo, J. Jose, S. Sur, and DK Panda, International Conference on High Performance Computing (HiPC '11), Dec 2011 [Slides] [Bib - Plain]
22	UPC Queues for Scalable Graph Traversals: Design and Evaluation on InfiniBand Clusters J. Jose, S. Potluri, M. Luo, S. Sur, and DK Panda, Fifth Conference on Partitioned Global Address Space Programming Model (PGAS '11), Oct 2011 [Slides] [Bib - Plain]
23	Unifying UPC and MPI Runtimes: Experience with MVAPICH J. Jose, M. Luo, S. Sur, and DK Panda, International Workshop on Partitioned Global Address Space (PGAS '10), Oct 2010 [Slides] [Bib - Plain]

Ph.D. Disserations (6)
1	C. Chu, Accelerator-enabled Communication Middleware for Large-scale Heterogeneous HPC Systems with Modern Interconnects, Jul 2020
2	J. Hashmi, Designing High Performance Shared-Address-Space and Adaptive Communication Middlewares for Next-Generation HPC Systems, Apr 2020
3	S. Chakraborty, High Performance and Scalable Cooperative Communication Middleware for Next Generation Architectures, Jun 2019
4	M. Li, Designing High-Performance Remote Memory Access for MPI and PGAS Models with Modern Networking Technologies on Heterogeneous Clusters, Nov 2017
5	J. Jose, Designing High Performance and Scalable Unified Communication Runtime (UCR) for HPC and Big Data Middleware, Aug 2014
6	S. Potluri, Enabling Efficient Use of MPI and PGAS Programming Models on Heterogeneous Clusters with High Performance Interconnects, May 2014

M.S. Thesis (1)
1	S. Srivastava, MVAPICH2-AutoTune: An Automatic Collective Tuning Framework for the MVAPICH2 MPI Library, May 2021

NOWLAB: Network Based Computing Lab