Changes between Version 44 and Version 45 of WikiStart

Dec 7, 2015 4:17:02 PM (2 years ago)

Added more GT pubs


  • WikiStart

    v44 v45  
    3636== Recent Publications ==
    3737 * R. Lim, A. Malony, B. Norris, N. Chaimov, "Identifying Optimization Opportunities within Kernel Execution in GPU Codes," International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar), August 2015.
     38 * P. Sao, X. Liu, R. Vuduc, X. Li. "A sparse direct solver for distributed memory Xeon Phi-accelerated systems." In //IPDPS//, May 2015.
    3839 * X. Dai, B. Norris, A. Malony, "Autoperf: Workflow Support for Performance Experiments," Workshop on Challenges in Performance Methods for Software Development (WOSP-C 2015), January 2015.
    3940 * D. Ozog, A. Siegel, A. Malony, "Full-Core PWR Transport Simulations on Xeon Phi Clusters," Joint International Conference on Mathematics and Computation, Supercomputing in Nuclear Applications and the Monte Carlo Method (SNA+MC 2015), April 2015.
    5354 * Seyong Lee, Dong Li, and Jeffrey S. Vetter, Interactive Program Debugging and Optimization for Directive-Based, Efficient GPU Computing, IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2014.
    5455 * Seyong Lee and Jeffrey S. Vetter, OpenARC: Open Accelerator Research Compiler for Directive-Based, Heterogeneous Computing, GTC14: GPU Technology Conference, Poster, March 2014.
     56 * P. Sao, R. Vuduc. "Self-stabilizing iterative solvers." In //ScalA// workshop at SC13, November 2013.
     57 * M. Dukhan, R. Vuduc. "Methods for high-throughput elementary functions." In //PPAM//, September 2013.
     58 * J. Choi, D. Bedard, R. Fowler, R. Vuduc. "A roofline model of energy." In //IPDPS//, May 2013.
     59 * H. Kim, R. Vuduc, S. Baghsorkhi, J. Choi, W.-m. Hwu. "Performance analysis and tuning for general purpose graphics processing units." //Synthesis Lectures on Computer Architecture//, 2012.
    5560 * Seyong Lee and Jeffrey S. Vetter, Evaluation of Directive-Based GPU Programming Models for Productive Exascale Computing, SC12: ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2012.
    5661 * L. Chang, J.A. Stratton, H. Kim, and W.W. Hwu, “A Scalable, Numerically Stable Tridiagonal Solver Using GPUs,” The International Conference for High-Performance Computing Networking, Storage, and Analysis (SC’12), Salt Lake City, 2012.
    5762 * J.A. Stratton, C. Rodrigues, I. R. Sung, L. Chang, N. Anssari, G. D. Liu, W. W. Hwu, and N. Obeid, “Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems,” IEEE Computer, August 2012, pp. 26-32.
     63 * K. Czechowski, C. McClanahan, C. Battaglino, K. Iyer, P.-K. Yeung, R. Vuduc. "On the communication complexity of 3D FFTs and its implications for exascale." In //ICS//, June 2012.
    5864 * I.-J. Sung, G. D. Liu, and W. W. Hwu, “DL: A Data Layout Transformation System for Heterogeneous Computing,” The  IEEE Innovative Parallel Computing Conference – Foundations and Applications of GPU, Manycore, and Heterogeneous Systems, San Jose, May, 2012.
    5965 * J. A. Stratton, N. Anssari, C. I. Rodrigues, I. Sung, N. Obeid, L. Chang, G. Liu, and W. Hwu,  “Optimization and Architecture Effects on GPU Computing Workload Performance,” The IEEE Innovative Parallel Computing – Foundations and Applications, San Jose, May, 2012.
     66 * A. Chandramowlishwaran, J.W. Choi, K. Madduri, R. Vuduc. "//Brief Announcement:// Towards a communication optimal fast multipole method and its implications for exascale." In //SPAA//, 2012.
    6067 * S. S. Baghsorkhi, I. Gelado, M. Delahaye, W. W. Hwu, “Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors,” Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, February, 2012
     68 * J. Sim, A. Dasgupta, H. Kim, R. Vuduc. "A performance analysis framework for identifying performance benefits in GPGPU applications." In //PPoPP//, February 2012.
    6169 * Spafford, K., Meredith, J., Vetter, J. Quantifying NUMA and Contention Effects in Multi-GPU Systems. Proceedings of the Fourth Workshop on General-Purpose Computation on Graphics Processors. March 2011.
    6270 * K. Spafford, J. Meredith, J. Vetter. “Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort” Proceedings of the Workshop on Parallel Programming on Accelerator Clusters (PPAC 2011). Austin, TX, USA. September 2011.
    6674== Recent Software Releases ==
    67 * TAU Performance System ( TAU v2.25, PDT v3.21), available:
     75 * TAU Performance System ( TAU v2.25, PDT v3.21), available:
    6876  * 1. Better parsing support in PDT for C, C++ based on EDG v4.10.1 parsers and for Fortran based on the gfortran 4.8.5 parser. These parsers have been ported to Linux x86_64, IBM BG/Q, IBM Power 8 Linux (ppc64le), ARM64 Linux, and Mac OS X.
    6977  * Support for CUDAv 7.5 and OpenACC compilers from PGI.
    8189 * DiGPUFFT (distributed GPU FFT):
    8290 * Yeppp!, a SIMD-optimized math library:
    83 * SuperLU_DIST 4.0 (with GPU support):
     91 * SuperLU_DIST 4.0 (with GPU support):
    8593== Software Guides ==