The recent advances in Artificial Intelligence and Big Data Analytics are driven by large amounts of data and computing hardware at extreme scale. On the other hand, the High Performance Computing (HPC) community has been tackling similar extreme scale computing challenges for several decades now. The purpose of this workshop is to bring together researchers and engineers from Artificial Intelligence, Big Data, and HPC communities on a common platform to share the latest challenges and opportunities. The focus of the workshop is designing, implementing, and evaluating Artificial Intelligence and Big Data workloads on massively parallel hardware equipped with multi/many-core CPUs and GPUs and connected with high-speed and low-latency networks like InfiniBand, Omni-Path, Slingshot, and others. The Artificial Intelligence workloads comprise of training and inferencing Deep Neural Networks as well as traditional models using a range of state-of-the-art Machine and Deep Learning frameworks including Scikit-learn, PyTorch, TensorFlow, ONNX, TensorRT, etc. In addition, the popular Big Data frameworks include Apache Spark, Dask, and Ray that are used to process large amounts of data on CPUs and GPUs to conduct insightful analysis.

The TIES 2024 workshop talks will cover a range of areas, including but not limited to:

  • Programming models, techniques, and tools for High-Performance Big Data analytics, Machine Learning, and Deep Learning on massively parallel systems including Cloud Computing platforms
  • Performance and communication optimizations for Big Data and Machine/Deep Learning using HPC technologies with a focus on accelerators
  • High-Performance data-loading in Big Data and Machine/Deep Learning frameworks including in-memory computing technologies and abstractions
  • Performance modeling and evaluation for emerging Big Data processing and Machine/Deep Learning frameworks with emphasis on parallelism
  • Large-scale and parallel graph processing using Big Data, Machine/Deep Learning frameworks
  • Fault tolerance, reliability, and availability for high-performance Big Data computing, Deep Learning, and Cloud Computing
  • Scheduling and provisioning compute and storage resources in data analytics and Artificial Intelligence workloads
  • Optimizing traditional scientific computing workloads with Deep Neural Networks
  • Case studies of Big Data and Machine/Deep Learning applications on HPC systems and Clouds
  • Parallel inferencing strategies on edge, HPC and cloud systems
  • High-Performance columnar data management techniques for GPUs
  • Case studies of Big Data and Machine/Deep Learning frameworks exploiting heterogeneous multi-/many-core (OpenPOWER, Xeon, AMD, and ARM) systems and accelerators such as Intel/AMD/NVIDIA GPUs, and FPGAs

Keynote Address


Speaker

Dan Stanzione, Texas Advanced Computing Center

Abstract

Title: The NSF Leadership Computing Facility, and the National AI Research Resource.

In this talk, I’ll cover upcoming developments in computing research infrastructure to support future big data/AI/HPC platforms, including design inputs for future systems. I’ll also talk about early experience on the Vista systems, one of the first deployments of NVIDIA’s Grace CPU and Grace-Hopper CPU/GPU integration. I’ll also touch on the programming models we are supporting, shifts in user workloads, and upcoming research directions.

Invited Talks


Organizing Committee


Workshop Organizers