Future Technologies Colloquium Series


Scalable and High-Performance MPI Design for Very Large InfiniBand Clusters


Sayantan Sur
Department of Computer Science and Engineering The Ohio State University
May 10, 2007
10:00 AM

ORNL 5100-Auditorium

Host: Jeff Vetter (vetter@ornl.gov )


ABSTRACT:

The ever increasing demand for more computational power by parallel scientific applications coupled with advances in processing and interconnection technology is driving the size of commodity clusters up. The Message Passing Interface (MPI) is a popular programming model that is used by almost all parallel scientific applications. InfiniBand is a cluster interconnect which is based on open standards and is gaining widespread acceptance in the HPC community. As the commodity clusters are scaling up, MPI implementations over InfiniBand are expected to scale accordingly and pass on the benefits to end applications. MVAPICH is a popular implementation of MPI over InfiniBand which is used by several hundred top computing sites all around the world. In my talk, I will focus on several novel features of InfiniBand, namely Shared Receive Queues (SRQ), Remote Direct Memory Access Read (RDMA-Read), Selective Message Interrupts and Atomic operations. These novel features enable entirely new MPI designs. New messaging protocols and associated flow control mechanisms designed to leverage the SRQ feature in InfiniBand will be described in this talk. These new protocols enable an order-of-magnitude less resource consumption and offer the best performance on large-scale clusters. I will also describe protocol design which leverages RDMA-Read coupled with selective message interrupt feature to achieve nearly complete overlap of computation and communication. These designs can reduce MPI application wait time and improve overall application performance. Finally, this talk will describe current research being carried out in this direction and outline the challenges in designing programming models and architectures for next-generation parallel systems.


# # #