General purpose Graphical Processing Units (GPUs) are becoming an
integral part of modern system architectures. They are pushing the
peak performance of the fastest supercomputers in the world and are
speeding up a wide spectrum of applications. While the GPUs provide
very high peak flops, data movement between host and GPU, and between GPUs continues to remain a bottleneck for both performance and programmer productivity. MPI has been the de-facto standard for parallel application development in the High Performance Computing domain. Many of the MPI applications are being ported to run on clusters with GPUs for higher performance. Our project aims to simplify this task by supporting standard Message Passing Interface (MPI) from GPU device memory through the
MVAPICH2 MPI library. While supporting the advanced features of MPI like collective communication, user-defined datatypes and one-sided communication among others, MVAPICH2 aims to optimize the data movement between host and GPU, and between GPUs in the best way possible with minimal or no overhead to the application developer.
Support for MPI communication from GPUs has been available in public releases of MVAPICH2 starting from version 1.8. The OSU Micro Benchmarks (OMB) have been extended to evaluate MPI communication between GPU and host, and between two GPUs. Some performance results using OMB and the latest release of MVAPICH2 are presented here. This effort is funded by NVIDIA Corporation.
Conferences/Workshops:
-
S. Potluri, H. Wang, D. Bureddy, A. K. Singh, C. Rosales and D. K. Panda,
Optimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication,
Workshop on Accelerators and Hybrid Exascale Systems (AsHES),
to be held in conjunction with IPDPS '12, May 2012.
-
A. Singh, S. Potluri, H. Wang, K. Kandalla, S. Sur and D. K. Panda,
MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives
and Benefits,
Workshop on Parallel Programming on Accelerator Clusters (PPAC '11),
held in conjunction with Cluster '11, Sept. 2011.
Abstract
Conference Slides
-
H. Wang, S. Potluri, M. Luo, A. Singh, X. Ouyang, S. Sur and
D. K. Panda,
Optimized Non-contiguous MPI Datatype Communication for GPU Clusters:
Design, Implementation and Evaluation with MVAPICH2,
IEEE Cluster '11, Sept. 2011.
Abstract
Conference Slides
-
H. Wang, S. Potluri, M. Luo, A. Singh,
S. Sur and D. K. Panda,
MVAPICH2-GPU: Optimized GPU to GPU Communication for
InfiniBand Clusters,
Int'l Supercomputing Conference (ISC), June 2011.
Abstract
Conference Slides
-
S. Potluri, H. Wang, D. Bureddy, A. K. Singh, C. Rosales and D. K. Panda,
Optimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication,
Int'l Workshop on Accelerators and Hybrid Exascale Systems (AsHES), in conjunction with
Int'l Parallel and Distributed Processing Symposium (IPDPS '12), May 2012.
Conference Slides
-
D. Bureddy, H. Wang, A. Venkatesh, S. Potluri and D. K. Panda,
OMB-GPU: A Micro-benchmark suite for Evaluating MPI Libraries on GPU Clusters
EuroMPI 2012, September 2012.