NIC-level Support for Collective Communication and Synchronization with Myrinet and Quadrics
Collective operations such as barrier, broadcast, and reduction are very
important
in many parallel and distributed programs. For example, an efficient
implementation of barrier is important because while processors are
waiting on a barrier, generally, no computation can be performed, which
impacts parallel speedup. The efficiency of barrier operations also
affects the granularity of parallel computation.
Some Network Interface Cards (NICs) such as Myrinet have programmable
processors which can be customized to support collective communication.
We take on the challenge to an efficient design and implement for NIC-level
collective operations, where the protocol is processed at the NIC rather
than at the host. By the protocol at the NIC, the messages for collectives
need not to be passed up to the host to have the next message sent back
down from host to the NIC. Instead, upon the reception of a message, the
NIC can immediately send the next message.
Current research along this direction investigates the following issues:
Host-Bypass NIC-Level Collective Operations on Programmable NIC