High Performance Computing for Massive Data Analysis

Colloq: Speaker: 
Kamesh Madduri
Colloq: Speaker Institution: 
Georgia Institute of Technology
Colloq: Date and Time: 
Mon, 2008-03-17 10:00
Colloq: Location: 
Building 5100, Room 128 (JICS Auditorium)
Colloq: Host: 
Jeff Vetter
Colloq: Host Email: 
Colloq: Abstract: 
Graph-theoretic abstractions are at the core of data-intensive problems arising in social and technological network analysis (e.g., identification of implicit online communities, viral marketing strategies, quantifying centrality and influence in interaction networks, web algorithms), systems biology (for instance, interactome analysis, epidemiological studies, disease modeling), and homeland security (e.g., detecting trends, anomalous patterns from socio-economic interactions and communication data). Due to their large memory footprint, fine-grained computational granularity, and low degrees of spatial locality, massive graph problems pose serious challenges on current parallel machines. In this talk, we present novel algorithmic approaches for enabling large-scale graph analysis. Our shared-memory implementations on high-end multicore servers from IBM and Sun, and massively multithreaded architectures such as the Cray XMT, are the first known to achieve significant parallel speedup for traversal, connectivity, and centrality problems on graph instances of the order of billions of vertices and edges. We also present SNAP (Small-world Network Analysis and Partitioning), an open-source parallel graph framework that we have designed for exploratory network analysis. With a combination of algorithm engineering and novel parallel graph algorithms in SNAP, we achieve a speedup of nearly two orders of magnitude over current state-of-the-art approaches for community identification and centrality in real-world networks.
Colloq: Speaker Bio: