High-Level Accelerator-Style Programming of Clusters with Triolet

Colloq: Speaker: 
Christopher Rodrigues
Colloq: Speaker Institution: 
University of Illinois at Urbana-Champaign
Colloq: Date and Time: 
Mon, 2014-11-10 10:30
Colloq: Location: 
Building 5700, Room F234
Colloq: Host: 
Jeffrey S. Vetter
Colloq: Host Email: 
vetter@ornl.gov
Colloq: Abstract: 
Container libraries are popular for parallel programming due to their simplicity. Programs invoke library operations on entire containers, relying on the library implementation to turn groups of operations into efficient parallel loops and communication. However, their suitability for parallel programming on clusters has been limited, due to having a limited repertoire of parallel algorithm implementations under the hood. In this talk, I will present Triolet, a high-level functional language for using a cluster as a computational accelerator. Triolet improves upon the generality of prior distributed container library interfaces by separating concerns of parallelism, loop nesting, and data partitioning. I will discuss how this separation is used to efficiently decompose and communicate multidimensional array blocks, as well as to generate irregular loop nests from computations with variable-size temporary data. These loop-building algorithms are implemented as library code. Triolet’s compiler inlines and specializes library calls to produce efficient parallel loops. The resulting code often performs comparably to handwritten C. For several compute-intensive loops running on a 128-core cluster (with 8 nodes and 16 cores per node), Triolet performs significantly faster than sequential C code, with performance ranging from slightly faster to 4.3× slower than manually parallelized C code. Thus, Triolet demonstrates that a library of container traversal functions can deliver cluster-parallel performance comparable to manually parallelized C code without requiring programmers to manage parallelism. Triolet carries lessons for the design of runtimes, compilers, and libraries for parallel programming using container APIs.
Colloq: Speaker Bio: 
Christopher Rodrigues got his Ph.D. in Electrical Engineering at the University of Illinois. He is one of the developers of the Parboil GPU benchmark suite. A computer architect by training, he has chased parallelism up the software stack, having worked on alias and dependence analysis, parallel programming for GPUs, statically typed functional language compilation, and the design of parallel libraries. He is interested in reducing the pain of writing and maintaining high-performance parallel code.