A Distributed OpenCL Framework using Redundant Computation and Data Replication

TitleA Distributed OpenCL Framework using Redundant Computation and Data Replication
Publication TypeConference Paper
Year of Publication2016
AuthorsKim, Junghyun, Jo Gangwon, Jung Jaehoon, Kim Jungwon, and Lee Jaejin
Conference NameACM SIGPLAN conference on Programming Language Design and Implementation (PLDI)
Abstract

Applications written solely in OpenCL or CUDA cannot execute on a cluster as a whole. Most previous approaches that extend these programming models to clusters are based on a common idea: designating a centralized host node and coordinating the other nodes with the host for computation. However, the centralized host node is a serious performance bottleneck when the number of nodes is large. In this paper, we propose a scalable and distributed OpenCL framework called SnuCL-D for large-scale clusters. SnuCL-D’s remote device virtualization provides an OpenCL application with an illusion that all compute devices in a cluster are confined in a single node. To reduce the amount of control-message and data communication between nodes, SnuCL-D replicates the OpenCL host program execution and data in each node. We also propose a new OpenCL host API function and a queueing optimization technique that significantly reduce the overhead incurred by the previous centralized approaches. To show the effectiveness of SnuCL-D, we evaluate SnuCL-D with a microbenchmark and eleven benchmark applications on a large-scale CPU cluster and a medium-scale GPU cluster.

DOI10.1145/2908080.2908094