Extreme Scale computing in HPC, Big Data, Deep Learning and Clouds are marked by multiple-levels of hierarchy and heterogeneity ranging from the compute units (many-core CPUs, GPUs, APUs etc) to storage devices (NVMe, NVMe over Fabrics etc) to the network interconnects (InfiniBand, High-Speed Ethernet, Omni-Path etc). Owing to the plethora of heterogeneous communication paths with different cost models expected to be present in extreme scale systems, data movement is seen as the soul of different challenges for exascale computing. On the other hand, advances in networking technologies such as NoCs (like NVLink), RDMA enabled networks and the likes are constantly pushing the envelope of research in the field of novel communication and computing architectures for extreme scale computing. The goal of this workshop is to bring together researchers and software/hardware designers from academia, industry and national laboratories who are involved in creating network-based computing solutions for extreme scale architectures. The objectives of this workshop will be to share the experiences of the members of this community and to learn the opportunities and challenges in the design trends for exascale communication architectures.

All times in Central European Summer Time (CEST)

Workshop Program

9:00 - 9:05 AM

Opening Remarks

Hari Subramoni, Aamir Shafi, and Dhabaleswar K (DK) Panda, The Ohio State University

9:05 - 10:00 AM

PDF YouTube

Keynote

Speaker: Scott Atchley, Oak Ridge National Laboratory

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: Frontier’s Exascale Architecture

Abstract: The Oak Ridge Leadership Computing facility has deployed several TOP500 number one systems including Jaguar, Titan, and Summit. OLCF is currently deploying Frontier, America’s first exascale supercomputer. This talk will examine the evolution of OLCF system designs and their impacts on the communication architecture and programming models. This evolution includes the addition of GPUs, the deployment of various interconnect topologies, and the adoption of new communication interfaces.

Speaker Bio: Scott Atchley is a Distinguished Scientist with the Oak Ridge National Laboratory’s National Center for Computational Science. He is the Systems Architecture team lead within the Technology Integration Group in the Advanced Technologies Section. Scott and his team focus on understanding technology trends and application needs in order to guide system procurements. Scott has been heavily involved in DOE’s Exascale Computing Initiative and Project. Scott served as the DOE Technical Representative for AMD’s FastForward-2 Node architecture program and for AMD’s PathForward program. He also led the ECP HPCM Testing and Evaluation effort and he co-leads ECP’s Slingshot Testing and Evaluation effort. Scott’s primary focus is on his role as the Technical Project Officer for Frontier.

10:00 - 10:30 AM

PDF

Speaker: Gilad Shainer, Mellanox/NVIDIA

Session Chair: Taisuke Boku, University of Tsukuba, Japan

Title: InfiniBand In-Network Computing: accelerating HPC and Cloud Application

Abstract: The high-performance InfiniBand network provides computing services via in-network computing acceleration engines, such as data aggregation and reduction, Message Passing Interface (MPI) Tag Matching, MPI All-to-All, and more. Offloading these data algorithms to the network, decreases the amount of data traversing the network, dramatically reduces the time of communication framework operations, enables compute and communication overlap and increases datacenter efficiency.

Speaker Bio: Gilad Shainer serves as senior vice-president of networking at NVIDIA. Mr. Shainer serves as the chairman of the HPC-AI Advisory Council organization, the president of UCF and CCIX consortiums, a member of IBTA and a contributor to the PCISIG PCI-X and PCIe specifications. Mr. Shainer holds multiple patents in the field of high-speed networking. He is a recipient of 2015 R&D100 award for his contribution to the CORE-Direct In-Network Computing technology and the 2019 R&D100 award for his contribution to the Unified Communication X (UCX) technology. Gilad Shainer holds a MSc degree and a BSc degree in Electrical Engineering from the Technion Institute of Technology in Israel.

10:30 - 11:00 AM

PDFYouTube

Speaker: Rob Sherwood, Intel

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: The Network Inside vs. Outside Your Servers

Abstract: Historically, the network outside of servers was designed completely different than the network inside. The network outside dealt with contention, asynchronicity, and as a result was difficult to engineer for predictable performance. By contrast, the memory and I/O systems inside servers historically has had low contention, was highly synchronous, and as a result would typically provide predictable performance. However, with modern workloads (including HPC, AI Training, live video transcoding, etc.), these two design points are converging: we need predictable performance for the network outside our servers and the network inside is growing increasingly contentious and becoming less predictable. This talk will review the causes in this trend and discuss technologies on the horizon that can help, including scale-out SDN-style centralized control and the roll of optical I/O systems.

Speaker Bio: Rob Sherwood recently joined Intel as the CTO of the Connectivity Group (Switch ASICs, NICs, IPUs and Silicon Photonics) and is leading the IPDK open-source project as well next-generation data center pathfinding. He has a background in running large production networks, shipping products, and publishing networking research. Previously, Rob helped lead Facebook's global network reliability efforts and worked on their FBOSS switch operating system. Rob also served as CTO at the networking startup Big Switch Networks (recently acquired by Arista). Rob also worked at Deutsche Telekom and participated in the early days of SDN and OpenFlow via the Stanford Clean State lab. Rob is an alumnus of the University of Maryland, where he received his Ph.D. in Computer Science.

11:00 - 11:30 AM

Morning Coffee Break

11:30 - 12:00 PM

PDFYouTube

Speaker: Yuuichirou Ajima, Fujistu, Japan

Session Chair: Hari Subramoni, The Ohio State University

Title: Tofu Interconnect D of Supercomputer Fugaku and Future Prospects

Abstract: This presentation will describe how Tofu Interconnect D supports the massively parallel architecture of the supercomputer Fugaku. Like its predecessor, the K computer, Fugaku has a massively parallel architecture focused on a parallel programming model. Both K computer and Fugaku nodes are single-socket CPUs that consume about 200W of power, and the role of the interconnect is critical to achieving highly scalable performance. Applications are first optimized on a single-socket CPU and then parallelized using multiple nodes in the next step. Therefore, the injection bandwidth of the interconnect is very important. Fugaku has an injection bandwidth of 6.5 PB/s for an overall system peak performance of 537 PFlops and a memory bandwidth of 163 PB/s. As a comparison, consider a fat-node architecture with accelerators and consuming about 3000W of power. Applications are first optimized on a single-socket CPU and then offloads the processing to the accelerators in the next step. Accelerator connection bandwidth is therefore very important. The importance of accelerator connection bandwidth in fat node architectures is comparable to that of injection bandwidth in massively parallel architectures. For example, the accelerators in Oak Ridge National Laboratory's Summit system have a connection bandwidth of 4.1 PB/s for a peak performance of 207 PFlops and a memory bandwidth of 25 PB/s. The Tofu interconnect is not suitable for access paths to the global file system or login nodes because of its topology that focuses on tightly coupling nearest neighbor compute nodes. Therefore, Fugaku also includes InfiniBand EDR at a ratio of one per 192 nodes. With a total of 828 InfiniBand EDR ports, Fugaku is connected to the global file system and login nodes via 36 36-port switches and two 324-port director switches. This presentation will also include a discussion of future systems and interconnects in Japan's NGACI community activities. The next generation is envisioned to leap beyond the K computer and Fugaku architecture, which will inevitably require new challenges in interconnect as well.

Speaker Bio: Dr. Yuichiro Ajima is a Principal Architect of the Future Society and Technology Unit of Fujitsu Limited. He received his doctoral degree (in Information Engineering) from the University of Tokyo and joined Fujitsu Laboratories Limited in 2002. He has been in charge of the development of supercomputer systems since 2007. He is the architect of the Tofu interconnect series, which has powered the K computer, the supercomputer Fugaku, and the PRIMEHPC series. He received the Ichimura Prize in Industry for Distinguished Achievement in 2012, the Imperial Invention Prize in 2014, the Commendation for Science and Technology by the Minister of Education, Culture, Sports, Science and Technology in 2017, and the Medal of Honor with Purple Ribbon in 2020.

12:00 - 12:30 PM

PDFYouTube

Speaker: Phil Murphy, Cornelis Networks

Session Chair: Hari Subramoni, The Ohio State University

Title: Unleashing system performance at scale in AI/HPDA/HPC environments via Cornelis Networks Omni-Path, a high-performance purpose-built scale-out interconnect

Abstract: The ability to effectively focus a significant amount of computational power on solving critical problems through the use of artificial intelligence, data analytics, and modeling/simulation techniques is vital to many scientific, commercial, and government organizations. The evolution of node hardware and software architectures to meet the rising workload challenges is putting unprecedented demands on the system interconnect. Highly engineered enablement of OpenFabrics Interfaces, coupled with highly efficient interconnect infrastructure capabilities, unleashes the aggregate performance potential of the participating nodes.

Speaker Bio: As CEO of Cornelis Networks, Phil Murphy is responsible for the overall management and strategic direction of the company. Prior to co-founding Cornelis Networks, Phil served as a director at Intel Corporation, responsible for fabric platform planning and architecture, product positioning, and business development support. Prior to that role, Phil served as vice president of engineering and vice president of HPC technology within QLogic’s Network Solutions Group, responsible for the design, development, and evangelizing of all high-performance computing products. Before joining QLogic, Phil co-founded high-performance interconnect startup SilverStorm Technologies, acquired by QLogic. Phil holds an MS degree in Computer and Information Science from the University of Pennsylvania.

12:30 - 1:00 PM

PDFYouTube

Speaker: Brendan Bouffler, AWS

Session Chair: Hari Subramoni, The Ohio State University

Title: AWS’s Elastic Fabric Adapter - an peculiar approach to solving for application scalability in HPC and Machine Learning

Abstract: EFA is the product of a different approach to solving some of the same problems in HPC when faced with some new challenges and - in particular - a new environment which wasn’t present during previous epochs of HPC: a hyper-scaled data center of heterogeneous node types with virtually no downtime, but with continuous growth over generations. We’ll touch on the origins of the project, the results, and bring you up to date with some new challenges we’re solving right now.

Speaker Bio: Brendan Bouffler has 25 years of experience in the global tech industry creating large-scale systems for HPC environments. He’s been responsible for designing and building thousands of systems for researchers and engineers, in every continent. Many of these efforts fed the top500 list, including some that made the top 5.

After leading the HPC organization in Asia for a hardware maker, Brendan joined Amazon in 2014 when it became clear to him that cloud would become the exceptional computing tool the global research & engineering community needed to bring forward the discoveries that would change the world for us all.

He holds a degree in Physics and an interest in testing several of its laws as they apply to bicycles. This has frequently resulted in hospitalization. He is based in London.

1:00 - 2:00 PM

Lunch Break

2:00 - 2:30 PM

PDFYouTube

Speaker: John Shalf, Lawrence Berkeley National Laboratory

Session Chair: Aamir Shafi, The Ohio State University

Title: Ultrascale System Interconnects at the End of Moore’s Law

Abstract: The tapering of lithography advances that have been associated with Moore’s Law will substantially change requirements for future interconnect architectures for large-scale datacenters and HPC systems. Architectural specialization is creating new datacenter requirements such as emerging accelerator technologies for machine learning workloads and rack disaggregation strategies will push the limits of current interconnect technologies. Whereas photonic technologies are often sold on the basis of higher bandwidth and energy efficiency (e.g. lower picojoules per bit), these emerging workloads and technology trends will shift the emphasis to other metrics such as bandwidth density (as opposed to bandwidth alone) and reduced latency, and performance consistency. Such metrics cannot be accomplished with device improvements alone, but require a systems view of photonics in datacenters.

Speaker Bio: John Shalf is Department Head for Computer Science Lawrence Berkeley National Laboratory, and recently was deputy director of Hardware Technology for the DOE Exascale Computing Project. Shalf is a coauthor of over 80 publications in the field of parallel computing software and HPC technology, including three best papers and the widely cited report “The Landscape of Parallel Computing Research: A View from Berkeley” (with David Patterson and others). He also coauthored the 2008 “ExaScale Software Study: Software Challenges in Extreme Scale Systems,” which set the Defense Advanced Research Project Agency’s (DARPA’s) information technology research investment strategy. Prior to coming to Berkeley Laboratory, John worked at the National Center for Supercomputing Applications and the Max Planck Institute for Gravitation Physics/Albert Einstein Institute (AEI) where he was was co-creator of the Cactus Computational Toolkit.

2:30 - 3:00 PM

Speaker: Keith Underwood, HPE

Session Chair: Aamir Shafi, The Ohio State University

Title: How HPC Followed the Unlikely Path to Ethernet

Abstract: For decades, the largest scale systems were built using proprietary networks. Ten years ago, Ethernet only existed on the Top500 list in the lowest rank systems, and nobody would have taken seriously a proposal to build a leadership class system using Ethernet. Ethernet versus “EtherNOT” was an ongoing, entertaining, academic debate. Then, in 2019, the unthinkable happened: the US DOE announced that their next leadership class systems would be based on Ethernet – specifically Cray Slingshot (now HPE Slingshot). What changed? How did the world’s leading proprietary interconnect maker come to be building Ethernet based solutions? How did customers come to accept Ethernet? This talk will discuss what it takes to deliver a leadership class system leveraging an HPC Ethernet solution, and the changes in the industry that made it possible.

Speaker Bio: Keith Underwood is a Distinguished Technologist in the HPE Slingshot advanced architecture group where he leads next generation NIC architecture definition. Prior to joining Cray, Keith lead the Omni-Path 2 NIC architecture at Intel. He was part of the team that created the Portals 4 API, and the MPI-3 RMA extensions.

3:00 - 3:30 PM

PDFYouTube

Speaker: Hemal Shah, Broadcom

Session Chair: Aamir Shafi, The Ohio State University

Title: Broadcom NIC Technologies for large-scale HPC and AI/ML systems

Abstract: With Ethernet speeds exceeding 100 Gbps and the availability of low latency, high performance RoCE implementations, Ethernet is primed for HPC, AI, and ML markets. Broadcom Ethernet NICs are at the forefront to address the needs of HPC, AI, and ML workloads with RDMA offloads, software infrastructure support, and congestion control features. In this talk, we will provide an overview of Broadcom NIC technologies for HPC, AI, and ML applications.

Speaker Bio: Hemal Shah is a Distinguished Engineer and Systems/Software/Standards architect in the Data Center Solutions Group (DCSG) division at Broadcom Inc. He leads and manages a team of architects. Hemal is responsible for the definition of product architecture and software roadmap/architecture of all product lines of Ethernet NICs. Hemal led the architecture definition of several generations of NetXtreme E-Series/NetXtreme I server product lines and NetXtreme I client product lines. Hemal spearheaded the system architecture development of TruFlow technology for vSwitch acceleration/packet processing software frameworks, TruManage technology for system and network management, device security features, virtualization and stateless offloads. Hemal has defined the system architecture of RDMA hardware/software solutions for more than two decades.

Before joining Broadcom in 2005, Hemal worked at Intel Corporation where he led the development of system/silicon/software architecture of communication processors, 10 Gigabit Ethernet controllers, TCP/iSCSI/RDMA offloads, and IPsec/SSL/firewall/VPN accelerations. Hemal is the lead technical representative/contributor from Broadcom Inc. in the Open Compute Project (OCP) and Distributed Management Task Force (DMTF). Hemal serves as Senior VP of Technology in the DMTF and a project co-lead of OCP Hardware Management project. Hemal has co-authored several OCP specifications, 70+ DMTF specifications, four IETF RFCs, and 10 plus technical conference/journal papers. Hemal is a named inventor on 40+ patents with several pending patents. Hemal holds Ph. D. (computer engineering) and M.S. (computer science) degrees from Purdue University, M.S.E.E. degree from The University of Arizona, and B.S. (electronics and communication engineering) degree from Gujarat University, India.

3:30 - 4:00 PM

Speaker: Steve Poole, Los Alamos National Laboratory

Session Chair: Aamir Shafi, The Ohio State University

Title: Making a case for Data Accelerators at LANL

Abstract: LANL has defined a “tailoring” philosophy for future system. Aligning with that overall philosophy, LANL is evaluating the overall distribution of elemental components of a variety of applications. With this in mind, we are currently evaluating both CSDs as well as DPUs for the purpose of offloading relevant pieces of applications.

Speaker Bio: Steve Poole is a Senior Scientist-6 at Los Alamos National Laboratory. He works in the areas of advanced architectures, system focused applications tailoring, advanced algorithms and true codesign. He is a co-founder of the UCF Consortium along with OpenSHMEM, OpenSNAPI, OpenUCX, OpenHPCA, and AHUG.

4:00 - 4:30 PM

Afternoon Coffee Break

4:30 - 5:00 PM

PDFYouTube

Speaker: Matthew Williams, Rockport Networks

Session Chair: Dhabaleswar K (DK) Panda, The Ohio State University

Title: Rockport Networking Solutions for Large-Scale HPC and AI systems

Abstract: The Rockport Switchless Network is a distributed, high performance, direct-interconnect solution modeled after the world's fastest supercomputers. Architected to achieve consistently fast and predictable performance, a Rockport network improves cluster performance and while helping create a greener data center. Matt Williams, CTO of Rockport, will provide an overview of the Rockport Switchless Network architecture, with best practices for performance and operational simplicity, along with a deeper dive into advances in scaled performance, congestion protection, and resiliency. We will also share benchmark and workload results comparing the switchless network approach with traditional networks.

Speaker Bio: Matt Williams brings 25 years of deep technical and business expertise to the role of Chief Technology Officer at Rockport Networks, where he is responsible for overall product vision and strategy. Prior to joining Rockport, Matt held various customer-facing technology positions at high-growth companies, including systems engineering, product management and product marketing. He is an expert strategist, analyst and visionary who has delivered on transformational product concepts across five generations of network architectures and has been awarded 22 patents. An active HPC community member, Matt is a sought-after speaker with recent presentations at SC21, ExaComm and the MVAPICH User Group Forum. Matt holds a B.Sc. in Electrical Engineering with First Class Honours from Queen’s University and is a registered P.Eng.

5:00 - 6:00 PM

PDF
PDF
PDF
PDF
YouTube

Moderator: Arthur Maccabe, Institute for the Future of Data and Computing, University of Arizona

Title: Shaping the Future of Distributed Computing: Bottom up, or Top down

Summary: Historically, distributed computing addressed the technologies needed to connect otherwise autonomous computing systems to enable coordinated computations, e.g., transactions involving multiple institutions. This is essentially a bottom-up approach where a distributed system is built from pre-existing systems. More recently, a top-down approach seems to be emerging, driven by the US DOE National Laboratories. In the “Super Facility” approach, leadership class systems, e.g., Frontier at ORNL, will be extended (pushed) to the edge. That is, nodes or cabinets that are similar to the nodes/cabinets Frontier will be placed near the scientific instruments (like the Spallation Neutron Source) and the interconnect fabric (e.g., Slingshot) will be extended to include these nodes.

  • Which approach, bottom-up or top-down, is most likely to shape the future of distributed computing?
  • To what extent will (should) economic considerations drive the future of distributed computing?
  • To what extent does the notion of technology refresh rate factor into how distributed systems should be built?
  • How much will developments in interconnect technologies, possibly including wireless technologies, be important in shaping this future?
  • How much will the required software infrastructure be critical in shaping this future?

Moderator Bio: Professor Arthur B. (Barney) Maccabe is the Executive Director of the Institute for the Future of Data and Computing (IFDC) at the University of Arizona. This newly created institute seeks to cultivate the data and computing capabilities needed to address grand challenges. An important goal of the institute is to elevate the visibility, impact, and sustainability of the artifacts created by the UArizona research community. The institute is committed to building deep relationships with industry and the research communities that are advancing the foundations of data and computing (including library science, mathematics, computer science, and computer engineering). Dr Maccabe’s faculty appointment is in the School of Information (iSchool) at the University of Arizona.

From 2009 to 2022, Dr Maccabe served as the Director of the Computer Science and Mathematics Division (CSMD) at the Oak Ridge National Laboratory (ORNL). The division has about 150 staff focused on use-inspired, fundamental research in computer science and applied mathematics. The division is organized in three sections: Advanced Computing Systems Research, Data and AI Systems, and Mathematics in Computation. During this time, Dr. Maccabe also served as the ORNL Point of Contact for the research portion of the Advanced Scientific Computing Research (ASCR) program in the Department of Energy, Office of Science.

Prior to his appointment at ORNL in 2009, Dr. Maccabe spent over twenty-five years as a faculty member in the Computer Science Department at the University of New Mexico (UNM). He graduated eleven PhD students and nine students with Masters degrees. While at the University of New Mexico, Dr. Maccabe also served as the director of UNM's Center for High Performance Computing (now called the Center for Advanced Research Computing) and as the Interim Chief Information Officer (CIO) for the university.

His research has focused on the design and development of system software for massively parallel systems. He was an architect for a series of lightweight operating systems for massively parallel computing systems. Dr. Maccabe has also conducted sponsored research in a wide- range of areas, including: dependence representation for compilers, network intrusion detection, network protocol offload, lightweight file and I/O systems, system software for sensor networks, and virtualization in high end computing systems.

Panelists:

  • Sadaf Alam, Swiss National Supercomputing, Switzerland
  • Debbie Bard, LBNL

    Bio: Debbie Bard is a physicist and data scientist with more than 15 years experience in scientific computing, both as a physicist and HPC expert. Her career spans research in particle physics, cosmology and computing, with a common theme of using supercomputing for scalable data analytics. She currently leads the Data Science Engagement Group at the National Energy Research Scientific Computing center (NERSC), supporting HPC for users of the DOE’s experimental and observational facilities. Debbie also leads the Superfacility project, a cross-discipline project of over 30 researchers and engineers that is coordinating research and development to support supercomputing for experimental science at LBNL.

  • Norbert Eicker, Jülich Supercomputing Centre (JSC), Germany

    Bio: Norbert Eicker is Professor for Parallel Hard- and Software Systems at Bergische Universität Wuppertal and head of the research group Cluster Computing at Jülich Supercomputing Centre. Before joining JSC in 2004 Norbert was with ParTec working on the Cluster Middleware ParaStation. During his career he was involved in several research and development projects of large scale Cluster systems in Wuppertal and Jülich. During the last years he acted as the lead architect for the DEEP series of project helping to develop the Cluster-Booster concept and the Modular Supercomputing Architecture and is now working on DEEP-SEA.

  • Rich Graham, NVIDIA/Mellanox

    Bio: Dr. Richard Graham is Senior Director, HPC Technology at NVIDIA's Networking Business unit. His primary focus is on HPC network software and hardware capabilities for current and future HPC technologies. Prior to moving to Mellanox/NVIDIA, Rich spent thirteen years at Los Alamos National Laboratory and Oak Ridge National Laboratory, in computer science technical and administrative roles, with a technical focus on communication libraries and application analysis tools. He is cofounder of the Open MPI collaboration, and was chairman of the MPI 3.0 standardization efforts.

6:00 - 6:05 PM

Closing Remarks

Hari Subramoni, Aamir Shafi, and Dhabaleswar K (DK) Panda, The Ohio State University