Training ultra long context language model with fully pipelined distributed transformer
J. Yao, S. Jacobs, M. Tanaka, O. Ruwase, H. Subramoni, D. Panda
The Eighth Annual Conference on Machine Learning and Systems,
May 2025.