Welcome to the Vancouver Project
Funded by the DOE Office of Science ASCR, the Vancouver Project is developing a software stack for productive programming on scalable, heterogeneous architectures. If you're not familiar with Vancouver, read this overview first.
Scalable Heterogeneous Computing (SHC) systems present several challenges: low programmer productivity, no portability, lack of integrated, standard tools and libraries, and high performance stability sensitivity. However, the most evident challenge is that few programming constructs are able to span the range from the fine-grain parallelism supported by these heterogeneous computing devices to the large-scale parallelism required for the Exascale. For example, OpenCL can address the former, and the Message Passing Interface (MPI) can address the latter, but there are few languages or software tools that address both levels simultaneously. These challenges also hold for performance tools, debuggers, resource managers, and libraries. Taken together, these issues will impede the adoption of SHC architectures by erecting a very high entry barrier to application teams and their scientific productivity.
The Vancouver team proposes to address these challenges to performance and productivity with a three-tiered approach for a next-generation software infrastructure designed for Exascale computing:
- High level system and abstractions for heterogeneous computing
- Programming, Development, and performance tools
- Low-level libraries, runtime systems, and benchmarks
We believe that, combined, these enhancements will greatly improve the productivity, and thus viability, of heterogeneous systems for Exascale high performance computing. More concretely, we will test, validate, and apply the infrastructure and tools we develop on systems and applications of interest to DOE, and we will deliver them as an open-source software stack. In fact, many of our team’s existing tools are already available on DOE systems. Furthermore, we will build upon our proven track record of working closely with application teams to port and optimize their applications onto SHC architectures. We believe that these close collaborations will be absolutely necessary in the early stages of the community’s transition to SHC systems, and we propose an aggressive outreach plan to help facilitate the success of this transition.
- R. Lim, A. Malony, B. Norris, N. Chaimov, "Identifying Optimization Opportunities within Kernel Execution in GPU Codes," International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar), August 2015.
- P. Sao, X. Liu, R. Vuduc, X. Li. "A sparse direct solver for distributed memory Xeon Phi-accelerated systems." In IPDPS, May 2015.
- X. Dai, B. Norris, A. Malony, "Autoperf: Workflow Support for Performance Experiments," Workshop on Challenges in Performance Methods for Software Development (WOSP-C 2015), January 2015.
- D. Ozog, A. Siegel, A. Malony, "Full-Core PWR Transport Simulations on Xeon Phi Clusters," Joint International Conference on Mathematics and Computation, Supercomputing in Nuclear Applications and the Monte Carlo Method (SNA+MC 2015), April 2015.
- D. Ozog, A. Siegel, A. Malony, "A Performance Analysis of SIMD Algorithms for Monte Carlo Simulations of Nuclear Reactor Cores," IEEE International Parallel and Distributed Symposium (IPDPS 2015), May 2015.
- Amit Sabne, Putt Sakdhnagool, Seyong Lee, and Jeffrey S. Vetter, Understanding Portability of a High-level Programming Model on Contemporary Heterogeneous Architectures, IEEE Micro, 2015
- Seyong Lee, Jeremy S. Meredith, and Jeffrey S. Vetter, COMPASS: A Framework for Automated Performance Modeling and Prediction, ACM International Conference on Supercomputing (ICS) 2015
- M. Graham Lopez, Jeffrey Young, Jeremy S. Meredith, Phil C. Roth, Mitchel Horton, Jeffrey S. Vetter, "Examining Recent Many-core Architectures and Programming Models Using SHOC", 6th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS15) (in conjunction with SC15), 2015.
- H.-S. Kim, I. E. Hajj, J. A. Stratton, S. S Lumetta, W.W. Hwu, "Locality-Centric Thread Scheduling for Bulk-synchronous Programming Models on CPU Architectures", International Symposium on Code Generation and Optimization (CGO), February 2015.
- J. Cabezas, L. Vilanova, I. Gelado, T. Jablin, N. Navarro, W. W. Hwu, "Automatic execution of single-GPU computations across multiple GPUs", Proceedings of the 23rd international conference on Parallel architectures and compilation (PACT), 2014
- C. I. Rodrigues,. A. Dakkak, T. Jablin, and W.W. Hwu, "Triolet: A Programming System that Unifies Algorithmic Skeleton Interfaces for High-Performance Cluster Computing", Proceedings of the 2014 ACM SIGPLAN Conference on Principles and Practice of Parallel Programing, February 2014.
- I. R. Sung, J. Gómez-Luna, J. M. González-Linares, N. Guil, W. W. Hwu, "In-place transposition of rectangular matrices on accelerators", PPoPP '14 Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming, October 2014.
- Nicholas Chaimov, Boyana Norris and Allen D. Malony, “Toward Multi-target Autotuning for Accelerators”, submitted to HPCC 2014: IEEE International Conference on High Performance Computing and Communications, August 2014.
- Piyush Sao, Richard Vuduc, Xiaoye Li, "A distributed CPU-GPU sparse direct solver." In Euro-Par, August 2014.
- Seyong Lee and Jeffrey S. Vetter, OpenARC: Extensible OpenACC Compiler Framework for Directive-Based Accelerator Programming Study, Workshop on Accelerator Programming Using Directives (WACCPD) in conjunction to SC14, 2014
- Amit Sabne, Putt Sakdhnagool, Seyong Lee, and Jeffrey S. Vetter, Evaluating Performance Portability of OpenACC, LCPC’14: The 27th International Workshop on Languages and Compilers for Parallel Computing, 2014
- Seyong Lee and Jeffrey S. Vetter, OpenARC: Open Accelerator Research Compiler for Directive-Based, Efficient Heterogeneous Computing, HPDC14: International ACM Symposium on High-Performance Parallel and Distributed Computing, Short Paper, June 2014.
- Seyong Lee, Dong Li, and Jeffrey S. Vetter, Interactive Program Debugging and Optimization for Directive-Based, Efficient GPU Computing, IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2014.
- Seyong Lee and Jeffrey S. Vetter, OpenARC: Open Accelerator Research Compiler for Directive-Based, Heterogeneous Computing, GTC14: GPU Technology Conference, Poster, March 2014.
- P. Sao, R. Vuduc. "Self-stabilizing iterative solvers." In ScalA workshop at SC13, November 2013.
- M. Dukhan, R. Vuduc. "Methods for high-throughput elementary functions." In PPAM, September 2013.
- J. Choi, D. Bedard, R. Fowler, R. Vuduc. "A roofline model of energy." In IPDPS, May 2013.
- H. Kim, R. Vuduc, S. Baghsorkhi, J. Choi, W.-m. Hwu. "Performance analysis and tuning for general purpose graphics processing units." Synthesis Lectures on Computer Architecture, 2012.
- Seyong Lee and Jeffrey S. Vetter, Evaluation of Directive-Based GPU Programming Models for Productive Exascale Computing, SC12: ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2012.
- L. Chang, J.A. Stratton, H. Kim, and W.W. Hwu, “A Scalable, Numerically Stable Tridiagonal Solver Using GPUs,” The International Conference for High-Performance Computing Networking, Storage, and Analysis (SC’12), Salt Lake City, 2012.
- J.A. Stratton, C. Rodrigues, I. R. Sung, L. Chang, N. Anssari, G. D. Liu, W. W. Hwu, and N. Obeid, “Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems,” IEEE Computer, August 2012, pp. 26-32.
- K. Czechowski, C. McClanahan, C. Battaglino, K. Iyer, P.-K. Yeung, R. Vuduc. "On the communication complexity of 3D FFTs and its implications for exascale." In ICS, June 2012.
- I.-J. Sung, G. D. Liu, and W. W. Hwu, “DL: A Data Layout Transformation System for Heterogeneous Computing,” The IEEE Innovative Parallel Computing Conference – Foundations and Applications of GPU, Manycore, and Heterogeneous Systems, San Jose, May, 2012.
- J. A. Stratton, N. Anssari, C. I. Rodrigues, I. Sung, N. Obeid, L. Chang, G. Liu, and W. Hwu, “Optimization and Architecture Effects on GPU Computing Workload Performance,” The IEEE Innovative Parallel Computing – Foundations and Applications, San Jose, May, 2012.
- A. Chandramowlishwaran, J.W. Choi, K. Madduri, R. Vuduc. "Brief Announcement:// Towards a communication optimal fast multipole method and its implications for exascale." In SPAA, 2012.
- S. S. Baghsorkhi, I. Gelado, M. Delahaye, W. W. Hwu, “Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors,” Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, February, 2012
- J. Sim, A. Dasgupta, H. Kim, R. Vuduc. "A performance analysis framework for identifying performance benefits in GPGPU applications." In PPoPP, February 2012.
- Spafford, K., Meredith, J., Vetter, J. Quantifying NUMA and Contention Effects in Multi-GPU Systems. Proceedings of the Fourth Workshop on General-Purpose Computation on Graphics Processors. March 2011.
- K. Spafford, J. Meredith, J. Vetter. “Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort” Proceedings of the Workshop on Parallel Programming on Accelerator Clusters (PPAC 2011). Austin, TX, USA. September 2011.
- A. Malony, S. Biersdorff, S. Shende, H. Jagode, S. Tomov, G. Juckeland, R. Dietrich, D. Poole,C. Lamb, "Parallel Performance Measurement of Heterogeneous Parallel Systems with GPUs" International Conference on Parallel Processing (ICPP), September 2011.
Recent Software Releases
- TAU Performance System ( TAU v2.25, PDT v3.21), available: http://tau.uoregon.edu/
- 1. Better parsing support in PDT for C, C++ based on EDG v4.10.1 parsers and for Fortran based on the gfortran 4.8.5 parser. These parsers have been ported to Linux x86_64, IBM BG/Q, IBM Power 8 Linux (ppc64le), ARM64 Linux, and Mac OS X.
- Support for CUDAv 7.5 and OpenACC compilers from PGI.
- Updates for OpenCL.
- Support for Intel Xeon Phi platforms including support for BFD (for address translation), libunwind, and OpenMP tools interface.
- Support for tracking energy consumption in code regions.
- Support for tracking non-uniform memory access (NUMA) in tau_exec.
- Support for OpenMP Tools interface (OMPT) based on LLVM runtime.
- Updates to better support MPC http://mpc.paratools.fr.
- SHOC Benchmark Suite
- OpenARC: Open Accelerator Research Compiler, http://ft.ornl.gov/research/openarc
- GMAC Library (v0.0.20), available: http://code.google.com/p/adsm/
- DiGPUFFT (distributed GPU FFT): http://code.google.com/p/digpufft/
- Yeppp!, a SIMD-optimized math library: http://www.yeppp.info/
- SuperLU_DIST 4.0 (with GPU support): http://crd-legacy.lbl.gov/~xiaoye/SuperLU/
- How to instrument SHOC with TAU on Keeneland, http://www.nic.uoregon.edu/tau-wiki/Keeneland
Contact the Team
- Jeffrey Vetter ORNL and Georgia Tech
- Allen Malony University of Oregon
- Wen-Mei Hwu UIUC
- Rich Vuduc Georgia Tech