A Tutorial on


High-Performance and Smart Networking Technologies for HPC and AI


In conjunction with SCA/HPCAsia 2026


Monday, January 26, 2026, 9:30 - 12:30 Japan Time


Osaka, Japan


by


Dhabaleswar K. Panda, Benjamin Michalowicz


The Ohio State University

Abstract

As InfiniBand (IB), High-speed Ethernet (HSE), RoCE, and Omni-Path technologies mature, they are being used to design and deploy various High-End Computing (HEC) systems: HPC clusters with GPGPUs supporting MPI, Storage and Parallel File Systems, Cloud Computing systems with SR-IOV Virtualization, Grid Computing systems, and Deep Learning systems. These systems are bringing new challenges in terms of performance, scalability, portability, reliability and network congestion. Many scientists, engineers, researchers, managers and system administrators are becoming interested in learning about these challenges, approaches being used to solve these challenges, and the associated impact on performance and scalability. This tutorial will start with an overview of these systems. Advanced hardware and software features of IB, Omni-Path, HSE, and RoCE and their capabilities to address these challenges will be emphasized. Next, we will focus on Open Fabrics RDMA and Libfabrics programming, and network management infrastructure and tools to effectively use these systems. A common set of challenges being faced while designing these systems will be presented. Case studies focusing on domain-specific challenges in designing these systems, their solutions and sample performance numbers will be presented. Finally, hands-on exercises will be carried out with Open Fabrics and Libfabrics software stacks and Network Management tools.

Outline

Presenters

Dhabaleswar K. Panda

Dhabaleswar K. Panda

DK Panda is a Professor and University Distinguished Scholar of Computer Science and Engineering at the Ohio State University. He is serving as the Director of the ICICLE NSF-AI Institute (https://icicle.ai). He has published over 500 papers. The MVAPICH MPI libraries, designed and developed by his research group (http://mvapich.cse.ohio-state.edu), are currently being used by more than 3,450 organizations worldwide (in 93 countries). More than 1.95 million downloads of this software have taken place from the project's site. This software is empowering many clusters in the TOP500 list. High-performance and scalable solutions for DL/ML frameworks from his group are available from https://hidl.cse.ohio-state.edu. Similarly, scalable and high-performance solutions for Big Data and Data science frameworks are available from https://hibd.cse.ohio-state.edu. Prof. Panda is an IEEE Fellow, an ACM Fellow, and the recipient of the 2022 IEEE Charles Babbage and 2024 TCPP Outstanding Service and Contribution Awards. More details about Prof. Panda are available at http://www.cse.ohio-state.edu/~panda.

Benjamin Michalowicz

Benjamin Michalowicz

Ben Michalowicz is a 5th year PhD student at the Ohio State University under Prof. DK Panda in the Network-Based Computing Laboratory. His research interests lie include high-performance computing (HPC), parallel architectures, network-based computing for HPC, and parallel programming environments. Specifically, he is interested in efficiently offloading workloads to Smart Network Cards like NVIDIA's BlueField DPUs. Ben actively contributes to the MVAPICH software.