Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning A. Awan, K. Hamidouche, A. Venkatesh, D. Panda The 23rd European MPI Users' Group Meeting (EuroMPI 16), Sep 2016.