HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions
B. Ramesh, N. Contini, N. Alnaasan, K. Suresh, M. Abduljabbar, A. Shafi, H. Subramoni, D. Panda
38th IEEE International Parallel & Distributed Processing Symposium,
May 2024.