Towards Understanding the Performance of FPGAs Using OpenCL Benchmarks

Colloq: Speaker: 
Naoya Maruyama
Colloq: Speaker Institution: 
RIKEN Advanced Institute for Computational Science
Colloq: Date and Time: 
Mon, 2016-03-21 10:00
Colloq: Location: 
Building 5100, Room 128 (JICS Auditorium)
Colloq: Host: 
Jeff Vetter
Colloq: Host Email:
Colloq: Abstract: 
We evaluate the performance of a sub-set of the benchmarks available in the Rodinia Suite, using Altera’s OpenCL SDK and the Terasic DE5-Net FPGA board, equipped with an Altera Stratix V GXA7 FPGA, and present timing and power estimation results and comparison with a modern CPU and GPU. The results are presented for multiple versions of each benchmark, each with a varying degree of optimization for FPGAs, ranging from direct ports from the initial OpenCL implementation to loop-pipelined kernels specifically optimized for FPGAs. Our results show that, while it is possible to use a common programming language available for other more-widely used accelerators in HPC, the implementation method optimal for FPGAs is significantly different from those for other accelerators such as GPUs. Specifically, we find that multi-threaded kernels typically used for GPUs do not perform as efficiently as those optimized with FPGA-specific optimizations such as sliding windows. However, by exploiting the FPGA-specific optimizations, FPGA with OpenCL shows promising performance. Our results using the Altera Stratix V 5SGXA7 FPGA indicate that, with FPGA-specific optimizations, it is possible to achieve up to 3.9x better power efficiency in comparison to an Nvidia K20C GPU.
Colloq: Speaker Bio: 
Naoya Maruyama is a Team Leader at RIKEN Advanced Institute for Computational Science, where he leads the HPC Programming Framework Research Team. His team focuses on high-level parallel frameworks for computational science applications to support productive and highly efficient computing with large-scale parallel systems such as RIKEN’s K computer. He is also the Principal Investigator for a JST Post Petascale System Software Project, where his team has been developing domain-specific programming models for mesh-based and particle-based applications that are primarily targeted to achieve high scalability and efficiency on large-scale GPU machines. Prior to join RIKEN, he was an Assistant Professor at Tokyo Institute of Technology, where he co-led several forward-looking GPU-computing projects, which eventually led to the TSUBAME GPU supercomputer. He won several awards, including a Gordon Bell Prize in 2011. He received Ph.D. in Computer Science from Tokyo Institute of Technology in 2008. He is a member of the ACM and IEEE Computer Society.