(Digital Workshop this year)
Extreme Scale computing in HPC, Big Data, Deep Learning and Clouds are marked by multiple-levels of hierarchy and heterogeneity ranging from the compute units (many-core CPUs, GPUs, APUs etc) to storage devices (NVMe, NVMe over Fabrics etc) to the network interconnects (InfiniBand, High-Speed Ethernet, Omni-Path etc). Owing to the plethora of heterogeneous communication paths with different cost models expected to be present in extreme scale systems, data movement is seen as the soul of different challenges for exascale computing. On the other hand, advances in networking technologies such as NoCs (like NVLink), RDMA enabled networks and the likes are constantly pushing the envelope of research in the field of novel communication and computing architectures for extreme scale computing. The goal of this workshop is to bring together researchers and software/hardware designers from academia, industry and national laboratories who are involved in creating network-based computing solutions for extreme scale architectures. The objectives of this workshop will be to share the experiences of the members of this community and to learn the opportunities and challenges in the design trends for exascale communication architectures.
ExaComm 2021 welcomes original submissions in a range of areas, including but not limited to:
Satoshi Matsuoka, RIKEN Center for Computational Science, Japan
Title: Fugaku and its Advanced Network Features for Disaggregation
Fugaku is currently the fastest supercomputer in the world, where its technical innovations and advancement is not only in the CPU itself but also in its interconnect. In fact, Fugaku not only incorporates the Tofu-D network interface and the associated DMAC, but also the 10-port switch that comprise the 6D Torus network. Since the network interface is directly connected to the intra-chip interconnect ring that also connect to memory and the L2 cache, Tofu-D network allows composition of disaggregated architecture, in that any memory in the system can be directly accessed from any CPU via RDMA, and the data be injected into the L2 cache. Such features allow for very low latency communication for MPI, especially sub-microsecond one-sided communication, but we also expect distributed shared memory features can be implemented efficiently on Fugaku, possibly matching performance of hardware-based NUMA machines.