Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference
J. Yao, Q. Anthony, A. Shafi, H. Subramoni, D. Panda
38th IEEE International Parallel & Distributed Processing Symposium,
May 2024.