Utilization and Extension of Task-based Runtime for High Performance Dense Linear Algebra Applications
Submitted by hebertem on Fri, 2017-04-28 16:32
Colloq: Speaker Institution:
University of Tennessee
Colloq: Date and Time:
Thu, 2017-02-23 10:00
Building 5700, Room MS-A104
Colloq: Host Email:
On the road to Exascale computing, dynamic task-based runtimes can alleviate the disparity between hardware peak performance and application performance, by providing executions that unfold only based on the dataflow between tasks. In this presentation, I would like to introduce two parts of my PhD work related to task-based runtime. The first part is the design of a unified framework to run high-performance dense linear algebra applications for platforms equipped with multi-GPUs and multi-Xeon Phi coprocessors. A lightweight task-based runtime is utilized to manage the resource-specific workload, and to control the dataflow and parallel execution in hybrid system. The second part of this presentation is to introduce the fault tolerant design for a task-based runtime. Three additions to a dynamic task-based runtime have been explored to build a generic framework providing soft error resilience, including sub-DAG method, data logging method and algorithm-based fault tolerant method. We also take one step further to improve the general data logging method to a remote version to provide resilience for hard error.
Colloq: Speaker Bio:
Chongxiao "Shawn" Cao is currently a Ph.D. candidate in Computer Science at the University of Tennessee, Knoxville. Chongxiao started in the Ph.D. program in August, 2011. He is working as a Research Assistant in the Innovative Computing Laboratory (ICL) under guidance of Dr. Jack Dongarra and Dr. George Bosilca. His research interests include fault tolerance in parallel computing, dynamic task-based runtimes and high performance Linear Algebra routines for distributed heterogeneous architectures.