CUDA Kernel based Collective Reduction Operations on Large-scale GPU Clusters
C. Chu, K. Hamidouche, A. Venkatesh, A. Awan, D. Panda
16th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid'16),
May 2016.