ORNL Future Technologies Group improves performance of major DOE applications over 23% using Memphis: a new tool for analyzing m

Collin McCurdy and Jeffrey Vetter - members of ORNL’s Future Technologies group - have recently developed Memphis: a tool that analyzes memory access patterns in scientific applications on Non-Uniform Memory Access (NUMA) architectures. The authors have been using Memphis to find and fix performance problems in several major DOE applications. These improvements have, so far, led to performance increases on the Cray XT5 at Oak Ridge of 23% for runs at scale of XGC1, and of 24% and 13% for single node runs of CAM and HYCOM, respectively. The results will be published in April at the 2010 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) as “Memphis: Finding and Fixing NUMA-related Performance Problems on Multi-core Platforms.” Not surprisingly, high-end scientific applications have been immune to NUMA problems due to the uniform latency of memory accesses offered by earlier Symmetric Multiprocessing (SMP) platforms. However, current trends in micro-processor design, including on-chip memory controllers and multi-core processing, are pushing NUMA issues into small-scale systems. Several platforms, such as the AMD Istanbul and Intel Westmere, are currently NUMA between sockets, and upcoming processors will be NUMA within a socket. Memphis uses hardware performance monitoring to pinpoint memory accesses to data arrays that cause NUMA-related performance problems. The team is continuing to analyze and optimize other applications for NUMA performance problems.  For more informatino, see the paper.