Timestamp Synchronization for Event Traces of Large-Scale Message-Passing Applications
Daniel Becker
Jülich Supercomputing Centre Forschungszentrum Jülich GmbH Jülich, Germany
October 18, 2007
02:00 PM
ORNL, 5700-L202
Host: Philip Roth
(rothpc@ornl.gov
)
ABSTRACT:
Identifying
wait states in event traces of message-passing applications requires
measuring temporal displacements between concurrent events. In the
absence of synchronized hardware clocks, linear interpolation
techniques can already account for differences in offset and drift,
assuming that the drift of an individual processor is not time
dependant. However, inaccuracies and drifts varying in time can still
cause violations of the logical event ordering. The controlled logical
clock algorithm accounts for such violations in point-to-point
communication by shifting message events in time as much as needed
while trying to preserve the length of intervals between local events.
In this talk, I describe how the controlled logical clock is extended
to collective communication to enable a more complete correction of
realistic message-passing traces. In addition, I present a parallel
version of the algorithm that is intended to scale to thousands of
application processes and outline its implementation within the
framework of the scalasca toolkit.
# # #