ExaComm :: ExaComm 2016

Extreme Scale computing is marked by multiple-levels of hierarchy and heterogeneity ranging from the compute units to storage devices to the network interconnects. Owing to the plethora of heterogeneous communication paths with different cost models expected to be present in extreme scale systems, data movement is seen as the soul of different challenges for exascale computing. On the other hand, advances in networking technologies such as NoCs (like NVLink), RDMA enabled networks and the like are constantly pushing the envelope of research in the field of novel communication and computing architectures for extreme scale computing. The goal of this workshop is to bring together researchers and software/hardware designers from academia, industry and national laboratories who are involved in creating network-based computing solutions for extreme scale architectures, to share their experiences and to learn the opportunities and challenges in designing next-generation HPC systems and applications.

ExaComm welcomes original submissions in a range of areas, including but not limited to:

Scalable communication protocols
High performance networks
Runtime/middleware designs
Impact of high performance networks on Deep Learning / Machine Learning
Impact of high performance networks on Big Data
Novel hardware/software co-design
High performance communication solutions for accelerator based computing
Power-aware techniques and designs
Performance evaluations
Quality of Service (QoS)
Resource virtualization and SR-IOV

Workshop Program
8:50 - 9:00	Opening Remarks Slides
9:00 - 10:00	Keynote: Optimizing Network Usage on Sequoia and Sierra Slides Speaker: Bronis R de Supinski, CTO, Livermore Computing (LC), Lawrence Livermore National Laboratory (LLNL) Abstract: Lawrence Livermore National Laboratory (LLNL) has a long history of leadership in large-scale computing. Our current platform, Sequoia, is a 96 rack BlueGene/Q system that is currently number three on the Top 500 list. Our next platform, Sierra, will be a heterogeneous system delivered by a partnership between IBM, NVIDIA and Mellanox. In this talk, we will explore optimizations of applications that run on these platforms, with a focus on their networks and the software that enables their efficient use.
10:00 - 10:30	Toward Extreme Scale: Requirements for Next-Generation Fabrics Speaker: William (Bill) Magro, Intel Fellow & Chief Technologist, HPC Software, Intel Abstract: Achieving performance at extreme scale requires a messaging fabric that meets a wide range of demanding requirements, including high bandwidth, high message rate, low latency, congestion management, low jitter, and minimal software overheads, among others. In this talk, we will present the key messaging fabric requirements Intel sees for achieving the next level of extreme-scale performance with Intel OmniPath Architecture.
10:30 - 11:00	Topology-Awareness in the Tofu Interconnect Series Slides Speaker: Yuichirou Ajima, Senior Architect, Fujitsu, Japan Abstract: Topology-aware communication is important for efficient data movement across a large-scale decentralized direct network. To encourage topology-aware optimization of communication, the system should provide topology-aware scheduling. In this talk, topology-awareness features of the Tofu and Tofu2 interconnects and the associated software stacks are presented.
11:00 - 11:30	Coffee Break
11:30 - 12:00	Technologies for improved scaling on GPU Clusters Slides Speaker: Jiri Kraus, Senior Developer, NVIDIA Abstract: Designing scalable applications requires to expose parallelism and minimize parallel overheads. In this talk we focus on the latter and present GPUDirect technologies to maximize inter GPU bandwidth and minimize latencies to improve scaling on GPU Clusters. This includes the latest addition GPUDirect Async and benchmarking results with NVLink and Telsa P100.
12:00 - 12:30	BXI - The new scalable interconnect for HPC Speaker: Jean-Pierre Panziera, Chief Technology Director for Extreme Computing, Bull Atos Technologies Title and Abstract: BXI, Bull eXascale Interconnect, is the new interconnection network developed by Atos for High Performance Computing. We first present an overview of the BXI network. The BXI network is designed and optimized for HPC workloads at very large scale. It is based on the Portals 4 protocol and permits a complete offload of communication primitives in hardware, thus enabling independent progress of computation and communication. We describe the BXI software stack for fabric management, monitoring, application runtime and performance analysis. We then explain how BXI traffic can be re-routed while the system is alive without any interruption of service, thus providing the highest level of resilience in case of a component failure and the maximum flexibility to adapt to different workloads.
12:30 - 1:00	About Management of Exascale Systems Slides Speaker: Tor Skeie, Professor, Adjunct Research Scientist, Simula Research Laboratory, Norway Abstract: This talk will be about management of Exascale systems where we elaborate on both various challenges that our community is facing and possible solutions. First we discuss on how to achieve scalability with InfiniBand (IB) where we consider OFEDs Scalable SA approach and recent research contributions, including making use of IB routers. Another important aspect of achieving Exascale scalability is the ability to carry out effective reconfigurations that are topology- and routing-aware, in order to provide reliability and maintaining performance when the traffic pattern changes. We here talk about recent research results as for example making use of SlimUpdate, generalized metabase-aided routing, and .hierarchical. reconfiguration. Finally the talk will be about ideas and challenges related to introducing self-adaptive management of multi-tenancy HPC systems, paving the path for a merge of HPC and cloud computing.
1:00 - 2:00	Lunch Break
2:00 - 2:30	Exascale by Co-Design Architecture Slides Speaker: Michael Kagan, CTO, Mellanox Technologies Abstract: High performance computing has begun scaling beyond Petaflop performance towards the Exaflop mark. One of the major concerns throughout the development toward such performance capability is scalability . at the component level, system level, middleware and the application level. A Co-Design approach between the development of the software libraries and the underlying hardware can help to overcome those scalability issues and to enable a more efficient design approach towards the Exascale goal.
2:30 - 3:00	Next Generation Interconnection for Accelerated Computing Slides Speaker: Taisuke Boku, Professor, University of Tsukuba, Japan Abstract: Up to the era of tens PFLOPS peak performance driven by accelerators such as GPU or MIC, all the inter-node communication between these accelerating devices depended on the common high performance interconnect such as InfiniBand. When the performance gap including latency between these communication channels and absolute performance of the accelerating devices becoms much larger than today, we need brand new solution to exploit their potential performance in the real world problems. We have been developing more direct solution for this problem, introducing FPGA technology as the glue of accelerating devices and communication channel to apply real co-designing to the system design. In this talk, I will provide such a work so far and what we should do in the next decade.
3:00 - 4:00	Research Paper Session Session Chair: Dr. Khaled Hamidouche, The Ohio State University SONAR: Automated Communication Characterization for HPC Applications, Steffen Lammel, Felix Zahn, and Holger Froning, University of Heidelberg, Germany. Slides Reducing the overhead of manipulating data-structure on PGAS by ordering memory access at the remote-side, Yuichirou Ajima, Takafumi Nose, Kazushige Saga, Naoyuki Shida and, Shinji Sumimoto, Fujitsu Limited, Japan. Slides
4:00 - 4:30	Coffee Break
4:30 - 6:00	Panel: Beyond Speeds and Feeds - What New Capabilities Are Needed for Exascale Interconnects? Panel Moderator : Ron Brightwell, Sandia National Laboratories Panel Members: Jeff Hammond, Research Scientist, Intel Corporation Slides Daniel Holmes, Applications Consultant in HPC Research, EPCC, United Kingdom Slides Laxmikant V. (Sanjay) Kale, Professor, University of Illinois at Urbana-Champaign Jay Lofstead, Senior Member of Technical Staff, Sandia National Laboratories Slides Tor Skeie, Professor, Adjunct Research Scientist, Simula Research Laboratory, Norway
6:00 - 6:10	Closing Remarks Slides

Organizing Committee

Program Chairs

Dhabaleswar K. (DK) Panda, The Ohio State University
Hari Subramoni, The Ohio State University
Khaled Hamidouche, The Ohio State University

Program Committee

Taisuke Boku, University of Tsukuba, Japan
Ron Brightwell, Sandia National Laboratories
Hans Eberle, NVIDIA
Ada Gavrilovska, Georgia Tech
Brice Goglin, INRIA, France
Dror Goldenberg, Mellanox Technologies
R. Govindarajan, Indian Institute of Science, Bangalore, India
Hai Jin, Huazhong University of Science and Technology, Wuhan, China
Yutong Lu, National University of Defense Technology, Changsha, Hunan Province, China
Takeshi Nanri, University of Kyushu, Japan
Sebastien Rumley, Columbia University
Martin Schulz, Lawrence Livermore National Laboratory
John M. Shalf, National Energy Research Scientific Computing Center / Lawrence Berkeley National Laboratory
Tor Skeie, Simula Research Laboratory, Norway
Sayantan Sur, Intel
Xin Yuan, Florida State University