Orchestrating a Cache-Memory Concert for Massive Parallelism

Colloq: Speaker: 
Weikuan Yu
Colloq: Speaker Institution: 
Florida State University
Colloq: Date and Time: 
Thu, 2016-03-03 10:00
Colloq: Location: 
Building 5100, Room 128 (JICS Auditorium)
Colloq: Host: 
Jeff Vetter
Colloq: Host Email: 
vetter@ornl.gov
Colloq: Abstract: 
There has been a rapid rise of massive parallelism in modern processors while the core count is expected to reach thousands in a decade. In contrast, memory bandwidth lags behind, causing an ever-growing gap between off-chip memory bandwidth and the cumulated computing power in a machine. The massive parallelism leads to a host of challenging issues. Particularly, it causes an explosion of recently accessed datasets with temporal locality of very short duration and spatial locality of different strides, frequently with memory accesses of unique striding patterns. This has led to serious challenges for conventional cache algorithms to achieve an effective use of limited cache capacity. In addition, the increasing width of massive parallelism leads to congested memory accesses in the memory pipeline, stalling the warp schedulers and degrading the performance. Hence, there is a critical need of a cache-memory concert that can orchestrate cache and memory management to meet the challenges of massive parallelism. This talk will present our recent research studies to orchestrate a cache-memory concert. First, we introduce a new cache indexing method that can adapt to memory accesses with different strides in this pattern, eliminate intra-warp associativity conflicts, and improves GPU cache performance. Then, we will present a divergence-aware Cache management that can orchestrate L1D cache management and warp scheduling together for GPGPUs. Finally, we will show the development of a cutting-edge warp-scheduling algorithm that can predict the resource demand of active warps, and throttle the consumption of Load-Store units for effective warp parallelism.
Colloq: Speaker Bio: 
Dr. Weikuan Yu is an Associate Professor in the Department of Computer Science at Florida State University (FSU). He served as a Research Staff Member in the Future Technologies Group at Oak Ridge National Laboratory until 2009, and then an assistant and associate professor at Auburn University until 2015. Dr. Yu has founded the Parallel Architecture and Systems Laboratory (PASL) at Auburn and FSU. His research interests include a multitude of technical areas including processor-memory architecture, big data analytics in social networks, high speed interconnects, cloud and distributed systems, storage and I/O systems. Many of Dr. Yu’s graduate students have joined prestigious organizations such as Boeing, Amazon, IBM, Intel, Yahoo and governmental laboratories upon graduation.