SHOC's configure script didn't detect CUDA/OpenCL/MPI properly, what's going on?

SHOC's configure script may not automatically find all of the packages it can possibly use on your system. You might see this during the configure step, or after you've finished configuring and are left wondering why it didn't build some particular versions of the benchmark programs (e.g., the OpenCL versions, or the MPI versions).

First, check the output from running the SHOC configure script. There are two places to look: the console output while your configure ran, and the config.log file that the configure script produced. If the console output suggests SHOC's configure script couldn't find a usable installation of a package such as CUDA, OpenCL, or MPI, look in the config.log file to see any error messages that may have been generated when the configure script was testing for that package.

SHOC's configure script expects to find some things in your PATH. For instance, if your system includes CUDA, it expects to find the nvcc program in your PATH. Since OpenCL uses a library-based approach, there is no OpenCL executable to test for, and so SHOC's configuration script has to be told where to find OpenCL headers and possibly also libraries using the CPPFLAGS and LDFLAGS variables when the configuration script is run. See examples of how to do this type of configuration in the shell scripts in the config directory of the SHOC distribution, such as the conf-linux-openmpi.sh file.

Another common configuration problem is that SHOC was unable to detect a working MPI installation. SHOC's current configure script needs to know the flags needed to compile and link MPI programs. There are two scripts in the config directory of the SHOC distribution that show examples of how to configure SHOC with MPICH2/MVAPICH2 (conf-linux-mpich2.sh) and OpenMPI (conf-linux-openmpi.sh). Currently, SHOC's configure script is not smart enough to detect all the MPI information it needs if you just type ./configure. In the future, we plan to improve the SHOC configure script to do more of the MPI detection automatically.

If all else fails and you still can't diagnose a configuration problem, send a question to shoc-help@elist.ornl.gov. Include the version numbers of relevant software such as SHOC, your compiler, CUDA and/or OpenCL, MPI (if any), and your operating system. Also, include the config.log file.

How can I use SHOC to do scaling studies on my GPU cluster?

SHOC uses MPI to run across a cluster. Here's a graph showing how Stencil2D scales on Keeneland, a cluster with 3 Tesla M2070 GPUs per node.

For this graph, I used weak scaling (the amount of work increases with the number of GPUs), and a large problem size. So, the ideal scaling would just be a flat horizontal line. For the most part, it looks pretty good, except for the abrupt jump in the beginning, where you move from 1 GPU to more than one.

To make this graph, I made a little perl script. Here's what it looks like:

# Simple script to run Stencil benchmark over various sizes
# Make sure you have a directory called stencil for your results
@rawOutput = `mpirun -np 1 ../bin/TP/CUDA/Stencil2D -s 4 --num-iters 1000 --msize 1,1     >./stencil/1.out`;
@rawOutput = `mpirun -np 3 ../bin/TP/CUDA/Stencil2D -s 4 --num-iters 1000 --msize 1,3     >./stencil/3.out`;
@rawOutput = `mpirun -np 6 ../bin/TP/CUDA/Stencil2D -s 4 --num-iters 1000 --msize 2,3     >./stencil/6.out`;
@rawOutput = `mpirun -np 9 ../bin/TP/CUDA/Stencil2D -s 4 --num-iters 1000 --msize 3,3     >./stencil/9.out`;
# and so on ...

What are all these parameters? Ok, here's a quick list:

  • -np - the number of MPI processes
  • -s 4 - specifies large problem size
  • –num-iters - the number of iterations to execute the stencil kernel
  • –msize - specifies the topology of MPI processes (Stencil operates on a big 2D grid of data, this just specifies how that grid is divided among MPI processes)

After I ran the script, I found the name of the test I want. In this case, it's “DP_Sten2D(median)”, median execution time for the double precision version.

I did a little grepping, and ended up with a nice tab-separated value file for Excel.

$ cd stencil
$ grep "DP_Sten2D(median)" ./* >results.tsv

Where can I find typical performance results?

We're currently evaluating a new web frontend for SHOC results using Google's Public Data Explorer. See it here: here. Let us know what you think.

To view some older results from version 1.0.3, check out ShocTown: http://ft.ornl.gov/~kspafford/shoctown/Shoctown.html

Also, if you would like to contribute some results, please send a tarball with your results.csv and Logs folders to shoc-dev@elist.ornl.gov

I have a dual I/O hub platform. How can I get the driver to use numactl for pinning?

We'll be adding pinning to the driver script in a future release. In the meantime, you can get the same benefits by changing the driver script. Here's an example for the HP SL390, where the desired numa mapping is (CPU 0 → GPU 0, CPU 1 → GPU 1, CPU 1 → GPU 2).

Go into the driver.pl script, and find the “buildCommand” subroutine.

Start right after “my $str”, and change the rest of the routine to:

my $numa;
   # $_[1] is the device number
   if ($_[1] == 0) {
       $numa = 0;
   else {
       $numa = 1;

   if (getArg("-read-only")) {
       $str = "echo " . $_[0];
   else {
       $str = "numactl --membind=$numa --cpubind=$numa "
     .  $bindir . "/Serial/"
     . $platformString
     . $_[0]
     . " -s $sizeClass -d "
     . $_[1] . " >"
     . buildFileName( $_[0], $_[1] ) . " 2>"
     . buildFileName( $_[0], $_[1] ) . ".err";
     # print "Built command: $str \n";
   return $str;
shoc/faq.txt · Last modified: 2011/10/27 16:45 by kspafford
Recent changes RSS feed Driven by DokuWiki