A Scalable Method for Identifying and Displaying the Communication Behavior of Large Scale Applications

Colloq: Speaker: 
Christoph Geile
Colloq: Speaker Institution: 
Jülich Supercomputing Centre
Colloq: Date and Time: 
Fri, 2008-01-25 14:00
Colloq: Location: 
5700-D307
Colloq: Host: 
Phil Roth
Colloq: Host Email: 
rothpc@ornl.gov
Colloq: Abstract: 
To satisfy their increasing demand for computing power, advanced numerical simulations are required to harness larger numbers of processors offered by modern capability computing systems. Unfortunately, satisfactory speedup on many thousands of processors is extraordinarily hard to achieve and requires adequate tool support for performance analysis at larger scales.SCALASCA is a performance analysis tool being developed at the Jülich Supercomputing Centre that searches event traces of parallel applications for undesirable wait states. Scalability of the analysis is maintained by analyzing the traces in parallel. So far, however, the findings of SCALASCA are aggregated without distinguishing between different peers are process exchanges messages with.This talk presents a preliminary study on how to store and display process-to-process metrics in a scalable manner. Using a few simple metrics as examples, it is shown how the corresponding process-to-process matrices can be efficiently extracted from our trace data and displayed for tens of thousands of processes using the statistical analysis tool R. The talk discusses design alternatives and shows first performance results.
Colloq: Speaker Bio: 
Christoph Geile is a member of the Division Application Support at the Jülich Supercomputing Centre. After finishing his training a mathematical technical assistant in February 2006, he earned a Diploma degree in Mathematics at the University for Applied Sciences Aachen in December 2007.