Scalable and High-Performance MPI Design for Very Large InfiniBand Clusters
Sayantan Sur
Department of Computer Science and Engineering The Ohio State University
May 10, 2007
10:00 AM
ORNL 5100-Auditorium
Host: Jeff Vetter
(vetter@ornl.gov
)
ABSTRACT:
The
ever increasing demand for more computational power by parallel
scientific applications coupled with advances in processing and
interconnection technology is driving the size of commodity clusters
up. The Message Passing Interface (MPI) is a popular programming
model that is used by almost all parallel scientific applications.
InfiniBand is a cluster interconnect which is based on open standards
and is gaining widespread acceptance in the HPC community. As the
commodity clusters are scaling up, MPI implementations over InfiniBand
are expected to scale accordingly and pass on the benefits to end
applications. MVAPICH is a popular implementation of MPI over
InfiniBand which is used by several hundred top computing sites all
around the world.
In my talk, I
will focus on several novel features of InfiniBand, namely Shared
Receive Queues (SRQ), Remote Direct Memory Access Read
(RDMA-Read), Selective Message Interrupts and Atomic operations. These
novel features enable entirely new MPI designs. New messaging
protocols and associated flow control mechanisms designed to leverage
the SRQ feature in InfiniBand will be described in this talk.
These new protocols enable an order-of-magnitude less resource
consumption and offer the best performance on large-scale clusters. I
will also describe protocol design which leverages RDMA-Read coupled
with selective message interrupt feature to achieve nearly complete
overlap of computation and communication. These designs can reduce MPI
application wait time and improve overall application performance.
Finally, this talk will describe current research being carried out in
this direction and outline the challenges in designing programming
models and architectures for next-generation parallel systems.
# # #