High Performance MPI for Clusters with GPUs
Network-based Computing Laboratory
Department of Computer Science and Engineering
Ohio State University

Overview        Publications        NVIDIA Developer Zone

General purpose Graphical Processing Units (GPUs) are becoming an integral part of modern system architectures. They are pushing the peak performance of the fastest supercomputers in the world and are speeding up a wide spectrum of applications. While the GPUs provide very high peak flops, data movement between host and GPU, and between GPUs continues to remain a bottleneck for both performance and programmer productivity. MPI has been the de-facto standard for parallel application development in the High Performance Computing domain. Many of the MPI applications are being ported to run on clusters with GPUs for higher performance. Our project aims to simplify this task by supporting standard Message Passing Interface (MPI) from GPU device memory through the MVAPICH2 MPI library. While supporting the advanced features of MPI like collective communication, user-defined datatypes and one-sided communication among others, MVAPICH2 aims to optimize the data movement between host and GPU, and between GPUs in the best way possible with minimal or no overhead to the application developer.

Support for MPI communication from GPUs has been available in public releases of MVAPICH2 starting from version 1.8. The OSU Micro Benchmarks (OMB) have been extended to evaluate MPI communication between GPU and host, and between two GPUs. Some performance results using OMB and the latest release of MVAPICH2 are presented here. This effort is funded by NVIDIA Corporation.