Changes between Version 44 and Version 45 of WikiStart
 Timestamp:
 Dec 7, 2015 4:17:02 PM (2 years ago)
Legend:
 Unmodified
 Added
 Removed
 Modified

WikiStart
v44 v45 36 36 == Recent Publications == 37 37 * R. Lim, A. Malony, B. Norris, N. Chaimov, "Identifying Optimization Opportunities within Kernel Execution in GPU Codes," International Workshop on Algorithms, Models, and Tools for Parallel Computing on Heterogeneous Platforms (HeteroPar), August 2015. 38 * P. Sao, X. Liu, R. Vuduc, X. Li. "A sparse direct solver for distributed memory Xeon Phiaccelerated systems." In //IPDPS//, May 2015. 38 39 * X. Dai, B. Norris, A. Malony, "Autoperf: Workflow Support for Performance Experiments," Workshop on Challenges in Performance Methods for Software Development (WOSPC 2015), January 2015. 39 40 * D. Ozog, A. Siegel, A. Malony, "FullCore PWR Transport Simulations on Xeon Phi Clusters," Joint International Conference on Mathematics and Computation, Supercomputing in Nuclear Applications and the Monte Carlo Method (SNA+MC 2015), April 2015. … … 53 54 * Seyong Lee, Dong Li, and Jeffrey S. Vetter, Interactive Program Debugging and Optimization for DirectiveBased, Efficient GPU Computing, IEEE International Parallel and Distributed Processing Symposium (IPDPS), May 2014. 54 55 * Seyong Lee and Jeffrey S. Vetter, OpenARC: Open Accelerator Research Compiler for DirectiveBased, Heterogeneous Computing, GTC14: GPU Technology Conference, Poster, March 2014. 56 * P. Sao, R. Vuduc. "Selfstabilizing iterative solvers." In //ScalA// workshop at SC13, November 2013. 57 * M. Dukhan, R. Vuduc. "Methods for highthroughput elementary functions." In //PPAM//, September 2013. 58 * J. Choi, D. Bedard, R. Fowler, R. Vuduc. "A roofline model of energy." In //IPDPS//, May 2013. 59 * H. Kim, R. Vuduc, S. Baghsorkhi, J. Choi, W.m. Hwu. "Performance analysis and tuning for general purpose graphics processing units." //Synthesis Lectures on Computer Architecture//, 2012. 55 60 * Seyong Lee and Jeffrey S. Vetter, Evaluation of DirectiveBased GPU Programming Models for Productive Exascale Computing, SC12: ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis, November 2012. 56 61 * L. Chang, J.A. Stratton, H. Kim, and W.W. Hwu, “A Scalable, Numerically Stable Tridiagonal Solver Using GPUs,” The International Conference for HighPerformance Computing Networking, Storage, and Analysis (SC’12), Salt Lake City, 2012. 57 62 * J.A. Stratton, C. Rodrigues, I. R. Sung, L. Chang, N. Anssari, G. D. Liu, W. W. Hwu, and N. Obeid, “Algorithm and Data Optimization Techniques for Scaling to Massively Threaded Systems,” IEEE Computer, August 2012, pp. 2632. 63 * K. Czechowski, C. McClanahan, C. Battaglino, K. Iyer, P.K. Yeung, R. Vuduc. "On the communication complexity of 3D FFTs and its implications for exascale." In //ICS//, June 2012. 58 64 * I.J. Sung, G. D. Liu, and W. W. Hwu, “DL: A Data Layout Transformation System for Heterogeneous Computing,” The IEEE Innovative Parallel Computing Conference – Foundations and Applications of GPU, Manycore, and Heterogeneous Systems, San Jose, May, 2012. 59 65 * J. A. Stratton, N. Anssari, C. I. Rodrigues, I. Sung, N. Obeid, L. Chang, G. Liu, and W. Hwu, “Optimization and Architecture Effects on GPU Computing Workload Performance,” The IEEE Innovative Parallel Computing – Foundations and Applications, San Jose, May, 2012. 66 * A. Chandramowlishwaran, J.W. Choi, K. Madduri, R. Vuduc. "//Brief Announcement:// Towards a communication optimal fast multipole method and its implications for exascale." In //SPAA//, 2012. 60 67 * S. S. Baghsorkhi, I. Gelado, M. Delahaye, W. W. Hwu, “Efficient Performance Evaluation of Memory Hierarchy for Highly Multithreaded Graphics Processors,” Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming, February, 2012 68 * J. Sim, A. Dasgupta, H. Kim, R. Vuduc. "A performance analysis framework for identifying performance benefits in GPGPU applications." In //PPoPP//, February 2012. 61 69 * Spafford, K., Meredith, J., Vetter, J. Quantifying NUMA and Contention Effects in MultiGPU Systems. Proceedings of the Fourth Workshop on GeneralPurpose Computation on Graphics Processors. March 2011. 62 70 * K. Spafford, J. Meredith, J. Vetter. “Quartile and Outlier Detection on Heterogeneous Clusters Using Distributed Radix Sort” Proceedings of the Workshop on Parallel Programming on Accelerator Clusters (PPAC 2011). Austin, TX, USA. September 2011. … … 65 73 66 74 == Recent Software Releases == 67 * TAU Performance System ( TAU v2.25, PDT v3.21), available: http://tau.uoregon.edu/75 * TAU Performance System ( TAU v2.25, PDT v3.21), available: http://tau.uoregon.edu/ 68 76 * 1. Better parsing support in PDT for C, C++ based on EDG v4.10.1 parsers and for Fortran based on the gfortran 4.8.5 parser. These parsers have been ported to Linux x86_64, IBM BG/Q, IBM Power 8 Linux (ppc64le), ARM64 Linux, and Mac OS X. 69 77 * Support for CUDAv 7.5 and OpenACC compilers from PGI. … … 81 89 * DiGPUFFT (distributed GPU FFT): http://code.google.com/p/digpufft/ 82 90 * Yeppp!, a SIMDoptimized math library: http://www.yeppp.info/ 83 * SuperLU_DIST 4.0 (with GPU support): http://crdlegacy.lbl.gov/~xiaoye/SuperLU/91 * SuperLU_DIST 4.0 (with GPU support): http://crdlegacy.lbl.gov/~xiaoye/SuperLU/ 84 92 85 93 == Software Guides ==