Colloq: Speaker Institution:
Colloq: Date and Time:
Mon, 2015-07-20 10:00
Building 5100, Room 128 (JICS Auditorium)
Colloq: Host Email:
Data collection and analysis is rapidly changing the way scientific, national security, and business communities operate. They have emerged as a fourth paradigm of science with American economic competitiveness and national security depending increasingly on the insightful analysis of large data sets. While extreme scale analytics share many of the computing issues as extreme scale scientific simulations, the nature of the problems and data create important differences. The volume, velocity, variety, and veracity of analytic data set it a part from scientific data. Moreover, the data does not partition neatly along physical boundaries, and algorithms do not map efficiently to bulk synchronous processes with nearest neighbor communication. This is true for both traditional table driven machine learning applications as well as emerging graph methods. While natural partitions can be found, irregular, inter-partition connections and extreme load imbalance limit scalability to small number of nodes for runtime systems that assign groups of data to single locals. Without scaling to large number of nodes, in-memory solutions based on such runtime systems are no more attractive than file-base solutions. While at PNNL, I architected GEMS --- a multithreaded, semantic graph engine. The framework had three components: 1) a SPARQL front end to transform SPARQL to data parallel C code; 2) a semantic graph engine with scalable multithreaded algorithms for query processing; and 3) a custom multithreaded runtime layer for scalable performance on conventional cluster systems. Our objectives were twofold: 1) to scale system size as data sizes increase, and 2) to maintain query throughput as system size grows. In this talk, I will summarize the data challenges facing scientists, intelligence analysts, and business leaders. I will discuss table and graph analytic methods and the problems introduced by the unbalanced distribution of real world data. I will describe GEMS in detail focusing on the graph engine and runtime layer, and present some performance results.
Colloq: Speaker Bio:
Dr. Feo received his Ph.D. in Computer Science from The University of Texas at Austin. He began his career at Lawrence Livermore National Laboratory where he managed the Computer Science Group and was the principal investigator of the Sisal Language Project. Dr. Feo then joined Tera Computer Company (now Cray Inc) where he was a principal engineer and product manager for the first two generations of the Cray’s multithreaded architecture. After a short 2 year “sabbatical” at Microsoft where he led a software group developing a next-generation virtual reality platform, he joined PNNL as the Director of the Center for Adaptive Supercomputer Software and Principal Investigator of a large DOD project in graph analytics. Mostly recently, Dr. Feo was VP of Engineering at Context Relevant. Dr. Feo’s research interests are parallel programming, graph algorithms, multithreaded architectures, functional languages, and performance studies. He has published extensively in these fields. He has held academic positions at UC Davis and is an adjunct faculty at Washington State University.