Optimizing MPI Communication on Multi-GPU Systems using CUDA Inter-Process Communication S. Potluri, H. Wang, D. Bureddy, A. Singh, C. Rosales, D. Panda International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), May 2012.