The recent advances in Artificial Intelligence and Big Data Analytics are driven by large amounts of data and computing hardware at extreme scale. On the other hand, the High Performance Computing (HPC) community has been tackling similar extreme scale computing challenges for several decades now. The purpose of this workshop is to bring together researchers and engineers from Artificial Intelligence, Big Data, and HPC communities on a common platform to share the latest challenges and opportunities. The focus of the workshop is designing, implementing, and evaluating Artificial Intelligence and Big Data workloads on massively parallel hardware equipped with multi/many-core CPUs and GPUs and connected with high-speed and low-latency networks like InfiniBand, Omni-Path, Slingshot, and others. The Artificial Intelligence workloads comprise of training and inferencing Deep Neural Networks as well as traditional models using a range of state-of-the-art Machine and Deep Learning frameworks including Scikit-learn, PyTorch, TensorFlow, ONNX, TensorRT, etc. In addition, the popular Big Data frameworks include Apache Spark, Dask, and Ray that are used to process large amounts of data on CPUs and GPUs to conduct insightful analysis

All times in Pacific Daylight Time (PDT)

Workshop Program

1:30 - 1:35 PM

Opening Remarks

Hari Subramoni, Aamir Shafi, and Dhabaleswar K (DK) Panda, The Ohio State University

1:35 - 2:30 PM

Keynote

Speaker: Leon Song, Microsoft

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: DeepSpeed4Science: Enabling System Support for Large Signature AI4Science Models at Scale

Abstract: With the new era of AIGC and large-scale language models being applied to change the landscape of the scientific discovery, DeepSpeed@Microsoft is establishing a new initiative with our partners from industry, academia and federal research labs to enable new science-driven system technologies to support large-scale science discovery through ML-driven models. In this talk, I will cover several of our signature models and engagements from our first release of DeepSpeed4Science and discuss what DeepSpeed system technologies are coming for the near future. We would like to make DeepSpeed4Science as a marketplace for scientists around the world to quickly acquire the essential system technologies that bottleneck their model development/deployment and also contribute to this new initiative from Microsoft.

Speaker Bio: Shuaiwen Leon Song is a senior principal scientist and manager at Microsoft. He leads the effort of Deepspeed4Science initiative to create a broad engagement between Microsoft, Microsoft research, DoE labs, academia and industry partners to enable sophisticated system technology research and development for supporting aspects of training and inference for large-scale AI-driven scientific models. At DeepSpeed, he also drives or co-drives several pathfinding projects and releases (e.g., ZeRO inference, scalable dialogue system design and DeepSpeed Chat) and co-manages the Brainwave team. Prior to Microsoft, he was the SOAR associate professor at University of Sydney and an adjunct professor at University of Washington. His past works in HPC have received several best paper nominations and were featured in U.S. DoE research highlights and other media outlets. He was the recipient of several awards including IEEE early-career award for HPC, IEEE mid-career award for scalable computing, Facebook faculty award, Google brain faculty award, Australian most innovative engineer award, AIR global faculty award. He is also an ACM distinguished speaker.

2:30 - 3:00 PM

PDF

Speaker: Steven Farrell, NERSC

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: Enabling AI and Analytics on Big Science Data at NERSC

Abstract: AI and Big Data analytics can enhance and accelerate scientific discovery. As these methodologies and applications mature and grow more complex, they increasingly require high performance computing (HPC) resources for developing and deploying models. NERSC is the mission HPC facility for the DOE Office of Science and provides advanced supercomputing resources for open science across many domains including materials sciences, biosciences, earth sciences, and physics. This presentation will describe our vision at NERSC and the associated challenges for enabling and supporting cutting-edge scientific AI and Big Data analytics through advanced HPC system deployments, research engagements, benchmark challenges and datasets, and outreach.

Speaker Bio: Steven Farrell is a Machine Learning Engineer at NERSC, Lawrence Berkeley National Laboratory, where he leads the AI Services Team in the Data and Analytics Services Group. Steven supports AI workloads on NERSC supercomputers and works with scientists to push on scientific AI research at large scale.

3:00 - 3:30 PM

Break

3:30 - 4:00 PM

PDF

Speaker: Zhao Zhang, TACC

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: Facilitating Emerging Deep Learning Paradigms at Texas Advanced Computing Center

Abstract: We have seen the fusion of HPC and deep learning (DL) over the past few years: The powerful HPC memory and communication architectures of HPC system have been shown to support DL applications well; Scientists are exploring and exploiting DL to solve domain research challenge. In this talk, I will share our experience of using TACC’s hardware, software, and human effort to facilitate the emerging deep learning paradigms. In particular, I will introduce how we help novice and experienced users to scale their applications on TACC machine. I will also present our collaboration with other institutions on scientific deep learning applications and our research progress in applications, architecture, optimizers, and cyberinfrastructure.

Speaker Bio: Dr. Zhao Zhang is a computer scientist and leads the machine learning group in the Data Intensive Computing group at Texas Advanced Computing Center (TACC). Prior joining TACC in 2016, he was a postdoc researcher at AMPLab, UC Berkeley and the data science fellow in Berkeley Institute for Data Science. Dr. Zhang received his Ph.D from the Department of Computer Science at UChicago in 2014. Dr. Zhang has extensive experience in high performance computing (HPC) and big data systems. His recent research focus is the fusion of HPC and deep learning (DL) with a wide range of topics of optimization algorithm, I/O, architecture, and domain applications.

4:00 - 4:30 PM

PDF

Speaker: Jason Lowe, NVIDIA

Session Chair: Aamir Shafi, The Ohio State University

Title: Big Data Analytics with the RAPIDS Accelerator for Apache Spark

Abstract: GPUs are a key tool in accelerating machine learning (ML) model training and inference, but CPUs are traditionally used to prepare the data for ML. Apache Spark is often used to perform data preparation at scale, but data preparation costs can be significan

This talk will give an overview of the RAPIDS Accelerator for Apache Spark that leverages GPUs to accelerate Spark DataFrame and SQL operations without any changes to the query code. We will discuss how it works and show the benefits of GPU acceleration not only for job latency but also for total cost and power consumption. The talk will also cover tools to help estimate job speedups and cost savings which can be critical for prioritizing jobs for acceleration in large scale operations.

Speaker Bio: Jason Lowe is a distinguished software engineer at NVIDIA working on solutions to accelerate ETL processing. Prior to NVIDIA, he worked for Yahoo on the Big Data Platform team and is a PMC member of Apache Hadoop and Tez. He has held positions of architect for the embedded Linux OS group in Motorola and principal programmer at Volition Games. Jason holds a BS in Computer Science from the University of Illinois at Urbana-Champaign.

Closing Remarks

Hari Subramoni, Aamir Shafi, and Dhabaleswar K (DK) Panda, The Ohio State University