ORNL Researchers accelerate materials application with Graphics Processing Units (GPUs)

In a recent article in the journal ‘Parallel Computing,’ a team of ORNL researchers (Jeremy S. Meredith, Gonzalo Alvareza, Thomas A. Maier, Thomas C. Schulthess, Jeffrey S. Vetter) show how they have accelerated the Quantum Monte Carlo simulation code, named DCA++, using graphics processing units (GPUs) as general-purpose computational devices (also known as GP-GPUs). While initially designed for real time rendering, the high performance and relatively low cost makes GPUs a desirable target for scientific computation. Recent efforts in the community have been addressing the programming challenges, with new languages such as CUDA and OpenCL being widely adopted. However, the original task of GPUs - rendering - has traditionally kept accuracy as a secondary goal, and sacrifices have sometimes been made as a result. In fact, much deployed GPU hardware is only capable of single precision arithmetic, and even this accuracy is not always equivalent to that of a commodity CPU. In this paper, the team investigated the accuracy and performance characteristics of GPUs on DCA++, including results from a preproduction double precision-capable GPU. They then accelerated the full DCA++ application, while concurrently investigating its tolerance to the different levels of arithmetic precision available in GPUs. The results show that while DCA++ has some sensitivity to the arithmetic precision, the single-precision GPU results were comparable to single-precision CPU results. Acceleration of the code on a fully GPU-enabled cluster showed that any remaining inaccuracy in GPU precision was negligible. Sufficient accuracy was retained for scientifically meaningful results while still showing significant speedups; the full parallel runtimes on the GPU cluster were five times faster than that on commodity microprocessors alone.