FT develops a highly-scalable global address space model for petascale computing on the Cray XT5

Vinod Tipparaju, Weikuan Yu, and Jeffrey Vetter - members of ORNL’s Future Technologies group – have recently designed and developed a highly optimized Aggregate Remote Memory Copy Interface (ARMCI) runtime library on the Cray XT5 2.3 PetaFLOPs computer at Oak Ridge National Laboratory. The results have been published in May at the 2010 ACM International Conference on Computing Frontiers (CF’10) in Bertinoro, Italy as “Enabling a highly-scalable global address space model for petascale computing” and at the 2010 International Supercomputing Conference in Hamburg, Germany as “Cooperative Server Clustering for a Scalable GAS Model on Petascale Cray XT5 Systems” (PDF). In the first paper, the team describes the design and implementation of ARMCI for the Cray XT5, and its optimization with the flow intimation technique, resulting in significant improvements for the Global Arrays (GA) programming model and a real-world chemistry application – NWChem – from small jobs up through 180,000 cores. In the second paper, the team examines the memory requirement of ARMCI on Cray XT5 and introduced a new technique “cooperative server clustering” to enhance the memory scalability of ARMCI communication servers. It leads to significant improvements in the memory requirement of ARMCI, thereby further boosting up the scalability of GAS scientific applications. The team demonstrated that the optimization reduces the memory footprint by five times for a program of 9600 processes, and the the total execution time of a scientific application of NWChem by 45% on 2400 processes (PDF).