High-Performance and Scalable Non-Blocking All-to-All with Collective Offload on InfiniBand Clusters: A Study with Parallel 3D FFT K. Kandalla, H. Subramoni, K. Tomko, D. Pekurovsky, S. Sur, D. Panda International Supercomputing Conference '11 (ISC'11), Jun 2011.