Download

Sign up to the SHOC announcement list at https://elist.ornl.gov/mailman/listinfo/shoc-announce in order to receive alerts of new releases and other changes.

Supported Platforms

The Dakar team intends SHOC to be useful on any platform with an OpenCL implementation. However, the Dakar team develops and tests SHOC primarily on Linux and Mac OS X systems. The following describes the supporting packages and versions commonly used by SHOC developers.

Linux

  • A recent RedHat-family OS distribution (Fedora or RHEL).
  • A working OpenCL implementation. The Dakar team has used the following implementations:
    • NVIDIA GPU Computing SDK version 3.2 and later
    • AMD APP SDK version 2.2 or later
  • (Optional) CUDA 3.2 or later.

This list describes the platforms to which the Dakar team has access for development and testing. SHOC may work on other Linux distributions with other OpenCL implementations than those listed here. Modifications may be needed for differing OpenCL header and library paths, differing system library versions, and differing compiler versions/vendors.

Mac OS X

  • Mac OS X 10.6 (Snow Leopard) or later.
  • Xcode 3.2 or later.
  • (Optional) CUDA 3.2 or later

Clusters

In addition to individual systems, SHOC can also build parallel benchmark programs for clusters. Each cluster node must meet the requirements described earlier on this page for the OS distribution used on that node. Also, the cluster must have a working implementation of the Message Passing Interface (MPI) library such as Open MPI or http://www.mcs.anl.gov/research/projects/mpich2.

Changelog

Changelog, version 1.1.2
  • Added new benchmark, Breadth-First Search, a common graph traversal. Implementations of two leading algorithms are included. Thanks to Aditya Sarwade, a 2011 summer intern at ORNL for contributing BFS.
  • Added texture memory optimization to OpenCL SpMV
  • Minor compatibility fixes for AMD CPUs
  • Macros added to S3D to fix precision issues with FP constants on some platforms
  • Changed TP Stencil default MPI topology to be more flexible
  • Shoc-help@elist.ornl.gov is now open to public subscribers
  • Preview version of TP Sort using Thrust available here. This version will be included in the main distribution pending the release of CUDA 4.1.
Changelog, version 1.1.1
  • Fixed build problem on systems that do not have OpenGL headers installed.
  • Fixed undefined reference build problem on systems that use recent gcc compilers.
  • Fixed bug in BusSpeedReadback benchmark that caused benchmark to fail on some small memory systems such as OS X laptops.
  • Added support for NVCXXFLAGS configure variable, so that different flags can be used when compiling C++ files and CUDA source files (e.g., compiler-specific options).
  • Modification to driver.pl script to test single device by default.
  • Documentation updates.
Changelog, version 1.1.0
  • Numerous bug fixes on AMD platforms.
  • New scan algorithm in CUDA and OpenCL. This upgrade from the previous recursive approach improves performance, reduces the required number of kernels to 3, and is more flexible when handling large problem sizes.
  • New TP versions of Reduction and Scan
  • Major refactoring of the driver script including finer device control. Previously, excessive execution times when using large problem sizes were unavoidable because the driver would always include available CPU devices.
  • Fixed minor memory leak in DeviceMemory that was causing problems on smaller GPUS (like ION).
  • Changed MaxFlops kernel generation to account for a more aggresive AMD compiler in APP SDK 2.4. This compiler was optimizing away some operations resulting in inaccurate FLOPS measurements.
  • The OpenCL version of MaxFlops uses vector types now, so the AMD compiler can generate vectorized (SSE) code on CPU devices.
  • Better use of memory for a few Level 0 tests. This results in better support for low-memory devices, and in some tests higher performance can be achieved when more memory is available.
Changelog, version 1.0.3
  • Driver script now reports NoResult instead of BenchmarkError for results dependent on missing features (like double precision)
Changelog, version 1.0.2
  • Fixed Timer that resulted in an error for OpenCL Sort/Scan in some environments
  • Updated driver script to record driver and compiler version and ECC status (CUDA only)
  • Fixed bug in outlier computation
  • Fixed reporting of sentinel value (previously, null results were shown as FLT_MAX instead of appropriate message)
Changelog, version 1.01
  • Bugfixes and improvements to Stencil2D
  • Fix to build system when using CUDA and MPI
  • Fixes to OpenCL version of S3D, including DP and PCIe timing
  • Better consistency between CUDA and OpenCL versions of auto-generated kernels in L0
  • Addition of ”-h” parameter to the driver script for specifying a hostfile
  • Added informative -help message for driver script
Changelog, version 1.0
  • Major bugfixes to Sparse Matrix-Vector Multiply
  • Fixes to Texture/Image memory tests
  • Various improvements to driver script for parallel and serial execution
  • Detection of mixed device types and performance outliers in parallel
  • Several bugfixes to the build system
  • Bug fix to the stability test when running on devices with large memory
  • Added double precision support to MaxFlops benchmark
  • Improved bus contention measurement benchmarks and its multi-threaded version
Changelog, version 0.9.2
  • Major fix to benchmark build system, which now supports gcc 4.4 compatibility
Changelog, version 0.9.1
  • Numerous bug fixes to OpenCL and CUDA benchmarks
  • Addition of Spmv benchmark, a sparse matrix-vector-multiply test
  • Addition of double precision support to many SHOC benchmarks (though not yet all)
  • Addition of an experimental driver for testing performance on NUMA systems
  • Changed SHOC to use a GNU autoconf-generated script for configuration
Changelog, version 0.9
  • Numerous bug fixes to OpenCL benchmarks including Scan, Sort, and Reduction
  • Addition of S3D benchmark, a combustion kernel
  • Flattened build system, including the ability to build without MPI
  • Fermi Compatibility, simplified build process for devices of different compute capability
  • Made success more obvious in Stability test
  • Shared memory optimization for MD benchmark
  • Data sizes increased for BusSpeed benchmark
  • Fixed bug in GFLOPS calculation in PeakFlops
  • Several improvements to contention benchmark
  • DeviceBW memory access pattern changed to avoid caching effects on Fermi
  • Changed Scan and Sort to report results in GB/s
 
shoc/downloads.txt · Last modified: 2011/11/11 17:08 by kspafford
Recent changes RSS feed Driven by DokuWiki