The recent advances in Artificial Intelligence and Big Data Analytics are driven by large amounts of data and computing hardware at extreme scale. On the other hand, the High Performance Computing (HPC) community has been tackling similar extreme scale computing challenges for several decades now. The purpose of this workshop is to bring together researchers and engineers from Artificial Intelligence, Big Data, and HPC communities on a common platform to share the latest challenges and opportunities. The focus of the workshop is designing, implementing, and evaluating Artificial Intelligence and Big Data workloads on massively parallel hardware equipped with multi/many-core CPUs and GPUs and connected with high-speed and low-latency networks like InfiniBand, Omni-Path, Slingshot, and others. The Artificial Intelligence workloads comprise of training and inferencing Deep Neural Networks as well as traditional models using a range of state-of-the-art Machine and Deep Learning frameworks including Scikit-learn, PyTorch, TensorFlow, ONNX, TensorRT, etc. In addition, the popular Big Data frameworks include Apache Spark, Dask, and Ray that are used to process large amounts of data on CPUs and GPUs to conduct insightful analysis

All times in Eastern Daylight Time (EDT)

Workshop Program

9:00 - 9:05

Opening Remarks

Hari Subramoni, Aamir Shafi, and Dhabaleswar K (DK) Panda, The Ohio State University

9:05 - 10:00

Keynote

Speaker: Dan Stanzione, Texas Advanced Computing Center (TACC)

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: Horizon: The NSF Leadership Computing Facility, and the National AI Research Resource.

Abstract: In this talk, I’ll cover upcoming developments in computing research infrastructure to support future big data/AI/HPC platforms, including design inputs for future systems. I’ll also talk about early experience on the Vista systems, one of the first deployments of NVIDIA’s Grace CPU and Grace-Hopper CPU/GPU integration. I’ll also touch on the programming models we are supporting, shifts in user workloads, and upcoming research directions.

Speaker Bio: Dan Stanzione is Associate Vice President for Research at The University of Texas at Austin since 2018 and Executive Director of the Texas Advanced Computing Center (TACC) since 2014. Dan is a nationally recognized leader in high performance computing. He is the principal investigator (PI) for a National Science Foundation (NSF) grant to acquire and deploy Frontera, which will be the fastest supercomputer at any U.S. university. Stanzione is also the PI of TACC's Stampede2 and Wrangler systems, supercomputers for high performance computing and for data-focused applications, respectively. For six years he was co-PI of CyVerse, a large-scale NSF life sciences cyberinfrastructure. Stanzione was also a co-PI for TACC's Ranger and Lonestar supercomputers, large-scale NSF systems previously deployed at UT Austin. Stanzione received his bachelor's degree in electrical engineering and his master's degree and doctorate in computer engineering from Clemson University.

10:00 - 10:30

Speaker: Olatunji Ruwase, Microsoft

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: Enabling Efficient Trillion Parameter Scale Training for Deep Learning Models

Abstract: Deep Learning (DL) is driving unprecedented progress in a wide range of Artificial Intelligence domains, including natural language processing, vision, speech, and multimodal. Sustaining this rapid pace of AI revolution, however, requires practical solutions to the extreme demands of model scaling on the compute, memory, communication, and storage components of modern computing hardware. To address this challenge, we created a deep learning optimization library called DeepSpeed to make distributed model training and inference efficient, effective, and easy on commodity hardware. This talk will focus on DeepSpeed optimizations for improving memory and compute of extreme-scale model training.

Speaker Bio:Olatunji (Tunji) Ruwase is the lead and co-founder of the DeepSpeed project at Microsoft. His broad industry and research background spans compilers, operating systems, and hardware accelerators. His current focus is on systems and convergence optimizations, and frameworks for efficient distributed training and inference of deep learning models. His research results on DL training, inference, and hyperparameter search are used in multiple Microsoft systems and products, such as Azure, Ads, Bing, Catapult, and HyperDrive. Tunji earned a PhD in Computer Science from Carnegie Mellon University under the guidance of Professor Todd Mowry.

10:30 - 11:00

Break

11:00 - 11:30

Speaker: Feiyi Wang, Oak Ridge National Laboratory (ORNL)

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: Large scale and energy efficient AI for science at leadership computing facility: observations and thoughts

Abstract: This talk will discuss, quantitatively, computational requirements, energy efficiency concerns, challenges, observations on supporting large scale AI for science on leadership computing platform such as Frontier system.

Speaker Bio:Feiyi Wang received his Ph.D. in Computer Engineering from North Carolina State University (NCSU). He is a Distinguished Research Scientist and Group Leader of Analytics and AI methods at Scale Group (AAIMS), at National Center for Computational Sciences of Oak Ridge National Laboratory (ORNL). His research interests include distributed machine learning and benchmarking, high performance storage system, parallel I/O and file systems. He is the recipient of SC'21 Best Paper Award, Bench'21 Best Paper Award, SC'21 and SC'22 Gordon Bell Covid Special Finalist, HPCC'17 Best Paper Finalist, SBDAC-PAD'16 Best Paper Finalist, SC'14 Best Paper Finalist. In 2022, He won UT-Battelle Award on Research Accomplishment, Distinguished Innovation, and prestigious Director's Award. i

Dr. Wang held Joint Faculty Professor of ECE Department, Bredesen Center Faculty position at University of Tennessee. He is also a Senior Member of IEEE.

11:30 - 12:00

Speaker: Murali Krishna Emani, Argonne National Laboratory

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: Toward a Holistic Performance Evaluation of Large Language Models Across Diverse AI Accelerators

Abstract: Artificial intelligence (AI) methods have become critical in scientific applications to help accelerate scientific discovery. Large language models (LLMs) are being considered a promising approach to address some challenging problems because of their superior generalization capabilities across domains. The effectiveness of the models and the accuracy of the applications are contingent upon their efficient execution on the underlying hardware infrastructure. Specialized AI accelerator hardware systems have recently become available for accelerating AI applications. However, the comparative performance of these AI accelerators on large language models has not been previously studied. In this work, we systematically study LLMs on multiple AI accelerators along with GPUs and evaluate their performance characteristics for these models. We evaluate these systems with (i) a micro-benchmark using a core transformer block, (ii) a GPT-model, and (iii) an LLM-driven science use case, GenSLM.

Next, I will focus on recent work on evaluating inference capabilities of these systems, besides an orthogonal approach of various implementation frameworks. I will present our findings and analyses of the models’ performance to better understand the intrinsic capabilities of AI accelerators in this benchmarking effort

Speaker Bio: Murali Emani is a Computer Scientist in the AI/ML group with the Argonne Leadership Computing Facility (ALCF). Prior, he was a Postdoctoral Research Staff Member at Lawrence Livermore National Laboratory, US. His research interests are in Scalable machine learning, AI accelerators, high-performance computing, and performance optimization. Murali co-leads the ALCF AI Testbed to explore the performance, efficiency of novel AI accelerators for scientific machine learning applications. He also co-chairs MLPerf HPC group at MLCommons to benchmark large scale ML on HPC systems.

12:00 - 12:30

Speaker: Abhinav Bhatele, University of Maryland

Session Chair: Aamir Shafi, The Ohio State University

Title: A 4D Hybrid Algorithm to Scale Parallel Training to Thousands of GPUs

Abstract: Training large language models (LLMs) has become a highly compute-intensive domain of AI research, requiring clusters with tens of thousands of GPUs. Heavy communication, in particular, collective operations, can become a critical performance bottleneck in scaling the training of multi billion-parameter neural networks to large-scale parallel systems. In this talk, I will present a novel 3D tensor + data parallel hybrid algorithm implemented in a highly scalable, open-source framework called AxoNN. I will describe several performance optimizations in AxoNN to overlap non-blocking collectives with computation, and use of performance models to decide performance optimal configurations within the large search space defined by our 4D algorithm. These have resulted in unprecedented scaling and peak flop/s on NVIDIA A100 and AMD MI250X GPU-based supercomputers.

Speaker Bio: Abhinav Bhatele is an associate professor in the department of computer science, and director of the Parallel Software and Systems Group at the University of Maryland, College Park. His research interests are broadly in systems and AI, with a focus on parallel computing and distributed AI. He has published research in parallel programming models and runtimes, network design and simulation, applications of machine learning to parallel systems, parallel deep learning, and on analyzing/visualizing, modeling and optimizing the performance of parallel software and systems. Abhinav has received best paper awards at Euro-Par 2009, IPDPS 2013, IPDPS 2016, and PDP 2024, and a best poster award at SC 2023. He was selected as a recipient of the IEEE TCSC Award for Excellence in Scalable Computing (Early Career) in 2014, the LLNL Early and Mid-Career Recognition award in 2018, the NSF CAREER award in 2021, the IEEE TCSC Award for Excellence in Scalable Computing (Middle Career) in 2023, and the UIUC CS Early Career Academic Achievement Alumni Award in 2024.

Abhinav received a B.Tech. degree in Computer Science and Engineering from I.I.T. Kanpur, India in May 2005, and M.S. and Ph.D. degrees in Computer Science from the University of Illinois at Urbana-Champaign in 2007 and 2010 respectively. He was a post-doc and later computer scientist in the Center for Applied Scientific Computing at Lawrence Livermore National Laboratory from 2011-2019. Abhinav is an associate editor of the IEEE Transactions on Parallel and Distributed Systems (TPDS). He was one of the General Chairs of IEEE Cluster 2022, and Research Papers Chair of ISC 2023.

12:30 - 12:35

Closing Remarks

Hari Subramoni, Aamir Shafi, and Dhabaleswar K (DK) Panda, The Ohio State University