Utilization and Extension of Task-based Runtime for High Performance Dense Linear Algebra Applications

Colloq: Speaker: 
Chongxiao Cao
Colloq: Speaker Institution: 
University of Tennessee
Colloq: Date and Time: 
Thu, 2017-02-23 10:00
Colloq: Location: 
Building 5700, Room MS-A104
Colloq: Host: 
Jeff Vetter
Colloq: Host Email: 
vetter@ornl.gov
Colloq: Abstract: 
On the road to Exascale computing, dynamic task-based runtimes can alleviate the disparity between hardware peak performance and application performance, by providing executions that unfold only based on the dataflow between tasks. In this presentation, I would like to introduce two parts of my PhD work related to task-based runtime. The first part is the design of a unified framework to run high-performance dense linear algebra applications for platforms equipped with multi-GPUs and multi-Xeon Phi coprocessors. A lightweight task-based runtime is utilized to manage the resource-specific workload, and to control the dataflow and parallel execution in hybrid system. The second part of this presentation is to introduce the fault tolerant design for a task-based runtime. Three additions to a dynamic task-based runtime have been explored to build a generic framework providing soft error resilience, including sub-DAG method, data logging method and algorithm-based fault tolerant method. We also take one step further to improve the general data logging method to a remote version to provide resilience for hard error.
Colloq: Speaker Bio: 
Chongxiao "Shawn" Cao is currently a Ph.D. candidate in Computer Science at the University of Tennessee, Knoxville. Chongxiao started in the Ph.D. program in August, 2011. He is working as a Research Assistant in the Innovative Computing Laboratory (ICL) under guidance of Dr. Jack Dongarra and Dr. George Bosilca. His research interests include fault tolerance in parallel computing, dynamic task-based runtimes and high performance Linear Algebra routines for distributed heterogeneous architectures.