HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions B. Ramesh, N. Contini, N. Alnaasan, K. Suresh, M. Abduljabbar, A. Shafi, H. Subramoni, D. Panda 38th IEEE International Parallel & Distributed Processing Symposium, May 2024.