High-Performance Deep Learning and Machine Learning


The availability of large data sets (e.g. ImageNet, PASCAL VOC 2012) coupled with massively parallel processors in modern HPC systems (e.g. NVIDIA GPUs) have fueled a renewed interest in Deep Learning (DL) and Machine Learning (ML) models. In addition to the popularity of massively parallel DL/ML accelerators like GPUs, the availability and memory-abundance of modern CPUs poses a viable alternative for DL/ML training. This resurgence of DL/ML applications has triggered the development of DL frameworks like PyTorch, TensorFlow, LBANN, and Apache MXNet as well as ML frameworks like Scikit-Learn and cuML. While most DL/ML frameworks provide experimental support for multi-node training, their distributed implementation is often suboptimal. Further, the emergence of distributed DL frameworks such as Horovod and DeepSpeed introduce novel parallelism challenges.


The objective of the HiDL/HiML projects are to design and implement novel parallelization strategies to train next-generation out-of-core models, and to exploit modern HPC technologies and solutions to fundamentally improve the performance of distributed DL/ML training and inference.

Conferences & Workshops (9)


Ph.D. Disserations (1)

1 M. Bayatpour, Designing High Performance Hardware-assisted Communication Middlewares for Next-Generation HPC Systems, May 2021

M.S. Thesis (2)

1 S. Srivastava, MVAPICH2-AutoTune: An Automatic Collective Tuning Framework for the MVAPICH2 MPI Library, May 2021
2 N. Senthil Kumar, Designing Optimized MPI+NCCL Hybrid Collective Communication Routines for Dense Many-GPU Clusters, May 2021