Conference Papers

  1. [ISC'19] Ignacio Laguna, Paul C. Wood, Ranvijay Singh, Saurabh Bagchi. GPUMixer: Performance-Driven Floating-Point Tuning for GPU Scientific Applications. ISC High Performance, Frankfurt, Germany, Jun 16-20, 2019 (Best Paper).

  2. [ICS'19] iPradeep Kotipalli, Ranvijay Singh, Paul Wood, Ignacio Laguna, and Saurabh Bagchi. AMPT-GA: Automatic Mixed Precision Floating Point Tuning for GPU Applications. 33rd ACM International Conference on Supercomputing (ICS), pp. 1-11, Jun 26-28, Phoenix, AZ.

  3. [HPDC'19] Michael Bentley, Ian Briggs, Ganesh Gopalakrishnan,Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Holger E. Jones. Multi-level Analysis of Compiler-Induced Variability and Performance Tradeoffs. The 28th International Symposium on High-Performance Parallel and Distributed Computing, Phoenix, Arizona, USA - June 24-28, 2019.

  4. [IPDPS'19] Giorgis Georgakoudis, Ignacio Laguna, Hans Vandierendonck, Dimitrios S. Nikolopoulos, Martin Schulz. SAFIRE: Scalable and Accurate Fault Injection For Parallel Multithreaded Applications. 33rd IEEE International Parallel & Distributed Processing Symposium (IPDPS), Rio de Janeiro, Brazil, May 20-24, 2019.

  5. [SC'18] Luanzheng Guo, Dong Li, Ignacio Laguna, Martin Schulz. FlipTracker: Understanding Natural Error Resilience in HPC Applications. ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC), Dallas, TX, 2018.

  6. [EuroMPI'18] Nawrin Sultana, Anthony Skjellum, Ignacio Laguna, Matthew Shane Farmer, Kathryn Mohror and Murali Emani. MPI Stages: Checkpointing MPI State for Bulk Synchronous Applications. In Proceedings of the 25th European MPI Users Group Meeting (EuroMPI), Barcelona, Spain, Sep. 23-26, 2018.

  7. [IPDPS'18] Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir Rakamaric, Ignacio Laguna, Gregory L Lee, Dong H Ahn. SWORD: A Bounded Memory-Overhead Detector of OpenMP Data Races in Production Runs. The The 32nd IEEE International Parallel and Distributed Processing Symposium (IPDPS), May, Vancouver, Canada, 2018.

  8. [SC'17] Giorgis Georgakoudis, Ignacio Laguna, Dimitrios S. Nikolopoulos, Martin Schulz. REFINE: Realistic Fault Injection via Compiler-Based Instrumentation for Accuracy, Portability and Speed. ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC), Denver, CO, 2017.

  9. [PPoPP'17] Sato, Kento, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz, and Christopher M Chambreau. Noise Injection Techniques to Expose Subtle and Unintended Message Races. In Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), Austin, Texas, USA, Feb, 2017.

  10. [IPDPS'17] David Beckingsale, Olga Pearce, Ignacio Laguna, and Todd Gamblin. Apollo: Reusable Models for Fast, Dynamic Tuning of Input-Dependent Code. In The 31th IEEE International Parallel and Distributed Processing Symposium (IPDPS), May, Orlando, Florida, USA, 2017.

  11. [SC'16] Ignacio Laguna, Martin Schulz. Pinpointing Scale-Dependent Integer Overflow Bugs in Large-Scale Parallel Applications. ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC), Salt Lake City, 2016.

  12. [IPDPS'16] Simone Atzeni, Ganesh Gopalakrishnan, Zvonimir Rakamaric, Dong H. Ahn, Ignacio Laguna, Martin Schulz, Gregory L. Lee, Joachim Protze, Matthias S. Muller. ARCHER: Effectively Spotting Data Races in Large OpenMP Applications. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Chicago, May 23-27, 2016.

  13. [CGO'16] Ignacio Laguna, Martin Schulz, David F. Richards, Jon Calhoun, Luke Olson. IPAS: Intelligent Protection Against Silent Output Corruption in Scientific Applications. In the 14th IEEE/ACM International Symposium on Code Generation and Optimization (CGO), Barcelona, March 12-18, 2016.

  14. [SC'15] Kento Sato, Dong H. Ahn, Ignacio Laguna, Gregory L. Lee, Martin Schulz. Clock Delta Compression for Scalable Order-Replay of Non-Deterministic Parallel Applications. In the ACM/IEEE Conference for High Performance Computing, Networking, Storage and Analysis (SC), Austin, Texas, Nov, 2015.

  15. [ICCS'15] A. Chien, P. Balaji, P. Beckman, N. Dun, A. Fang, H. Fujita, K. Iskra, Z. Rubenstein, Z. Zheng, R. Schreiber, J. Hammond, J. Dinan, I. Laguna, D. Richards, A. Dubey, B. van Straalen, M. Hoemmen, M. Heroux, K. Teranishi, A. Siegel. Versioned Distributed Arrays for Resilience in Scientific Applications: Global View Resilience. In the International Conference On Computational Science (ICCS), Reykjavik, Iceland, June 1-3, 2015.

  16. [EuroMPI'14] Ignacio Laguna, David F. Richards, Todd Gamblin, Martin Schulz, Bronis R. de Supinski. Evaluating User-Level Fault Tolerance for MPI Applications. In EuroMPI/ASIA, Kyoto, Japan, Sep 9-12, 2014.

  17. [PLDI'14] Subrata Mitra, Ignacio Laguna, Dong H. Ahn, Saurabh Bagchi, Martin Schulz, and Todd Gamblin. Accurate Application Progress Analysis for Large-Scale Parallel Debugging. In ACM International Symposium on Programming Language Design and Implementation (PLDI), Edinburgh, UK, June 9-11, 2014.

  18. [SRDS'13] Ignacio Laguna, Subrata Mitra, Fahad A Arshad, Nawanol Theera-Ampornpunt, Zongyang Zhu, Saurabh Bagchi, Samuel P Midkiff, Mike Kistler, Ahmed Gheith. Automatic Problem Localization via Multidimensional Metric Profiling. In IEEE 32nd International Symposium on Reliable Distributed Systems (SRDS), Braga, Portugal, Sep-Oct, 2013.

  19. [PACT'12] Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Saurabh Bagchi, Todd Gamblin. Probabilistic Diagnosis of Performance Faults in Large-Scale Parallel Applications. In International Conference on Parallel Architectures and Compilation Techniques (PACT), Minneapolis, MN, Sep, 2012.

  20. [DSN'12] Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi and Bronis R. de Supinski. Characterization via Abnormality-Enhanced Classification. In IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Boston, Massachusetts, Jun, 2012.

  21. [SC'11] Ignacio Laguna, Todd Gamblin, Bronis R. de Supinski, Saurabh Bagchi, Greg Bronevetsky, Dong H. Ahn, Martin Schulz, Barry Rountree. Large Scale Debugging of Parallel Tasks with AutomaDeD. In ACM/IEEE Supercomputing (SC), Seattle, WA, Nov 2011.

  22. [DSN'10] Greg Bronevetsky, Ignacio Laguna, Surabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, Martin Schulz. AutomaDeD: Automata-Based Debugging for Dissimilar Parallel Tasks. In IEEE/IFIP International Conference on Dependable Systems and Networks (DSN), Chicago Illinois, Jun-Jul, 2010.

  23. [Middleware'09] Ignacio Laguna, Fahad A. Arshad, David M. Grothe, Saurabh Bagchi. How To Keep Your Head Above Water While Detecting Errors. In ACM/IFIP/USENIX 10th International Middleware Conference (Middleware), UIUC Illinois, Nov-Dec 2009.

  24. [SC'09] Dong H. Ahn, Bronis R. de Supinski, Ignacio. Laguna, Greg L. Lee, Ben Liblit, Barton P. Miller, and Martin Schulz. Scalable Temporal Order Analysis for Large Scale Debugging. In ACM/IEEE Supercomputing (SC), Portland, OR, Nov 2009.

  25. [SRDS'07] Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad and Saurabh Bagchi. Distributed Diagnosis of Failures in a Three Tier E-Commerce System. In IEEE Symposium on Reliable Distributed Systems (SRDS), Beijing, China, Oct 2007.

  26. [SRDS'07] Gunjan Khanna, Ignacio Laguna, Fahad A. Arshad and Saurabh Bagchi. Stateful Detection in High Throughput Distributed Systems. In IEEE Symposium on Reliable Distributed Systems (SRDS), Beijing, China, Oct 2007.

Journal Papers

  1. [ParCo] Nawrin Sultana, Martin Rüfenacht, Anthony Skjellum, Ignacio Laguna, and Kathryn Mohror. Failure recovery for bulk synchronous applications with MPI stages. Parallel Computing, Volume 84, May 2019, Pages 1-14.

  2. [IJHPCA] Sato, Kento, Ignacio Laguna, Gregory L Lee, Martin Schulz, Christopher M Chambreau, Simone Atzeni, Michael Bentley, et al.. Pruners: Providing reproducibility for uncovering non-deterministic errors in runs on supercomputers. The International Journal of High Performance Computing Applications.

  3. [CCPE] Sourav Chakraborty, Ignacio Laguna, Murali Emani, Kathryn Mohror, Dhabaleswar K. Panda, Martin Schulz, Hari Subramoni. EReinit: Scalable and Efficient Fault Tolerance for Bulk-Synchronous MPI Applications. Concurrency and Computation: Practice and Experience, Wiley.

  4. [IJHPCA] A. Chien, P. Balaji, N. Dun, A. Fang, H. Fujita, K. Iskra, Z. Rubenstein, Z. Zheng, J. Hammond, I. Laguna, D. Richards, A. Dubey, B. van Straalen, M. Hoemmen, M. Heroux, K. Teranishi, A. Siegel. Exploring versioned distributed arrays for resilience in scientific applications: global view resilience. The International Journal of High Performance Computing Applications (IJHPCA), 31, no. 6 (2017): 564-590..

  5. [IJHPCA] Ignacio Laguna, David F. Richards, Todd Gamblin, Martin Schulz, Bronis R. de Supinski, Kathryn Mohror, and Howard Pritchard. Evaluating and Extending User-Level Fault Tolerance in MPI. The International Journal of High Performance Computing Applications (IJHPCA), vol. 30, num. 3, pp. 305-319, Sep, 2016.

  6. [CACM] Ignacio Laguna, Dong H. Ahn, Bronis R. de Supinski, Todd Gamblin, Gregory L. Lee, Martin Schulz, Saurabh Bagchi, Milind Kulkarni, Bowen Zhou, Zhezhe Chen, and Feng Qin. Debugging high-performance computing applications at massive scales. In Communications of the ACM, September, 2015.

  7. [TPDS] Ignacio Laguna, Dong Ahn, Bronis de Supinski, Saurabh Bagchi, and Todd Gamblin. Diagnosis of Performance Faults in Large Scale MPI Applications via Probabilistic Progress-Dependence Inference. IEEE Transactions on Parallel and Distributed Systems (TPDS), vol. 26, no. 5, pp. 1280-1289, May 2015.

  8. [CSE] Martin Schulz, Jim Belak, Abhinav Bhatele, Peer-Timo Bremer, Greg Bronevetsky, Marc Casas, Todd Gamblin, Katherine E. Isaacs, Ignacio Laguna, Joshua Levine, Valerio Pascucci, David Richards, Barry Rountree. Performance analysis techniques for the exascale co-design process. Parallel Computing: Accelerating Computational Science and Engineering (CSE), vol. 25, pag. 19, 2014, IOS Press.

Workshop Papers

  1. [ScalA] Ranvijay Singh, Paul Wood, Ravi Gupta, Saurabh Bagchi, and Ignacio Laguna. Snowpack: efficient parameter choice for GPU kernels via static analysis and statistical prediction. In Proceedings of the 8th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA ’17), @SC17, Denver, CO, 2017.

  2. [FTXS] Ayush Patwari, Ignacio Laguna, Martin Schulz, Saurabh Bagchi. Understanding the Spatial Characteristics of DRAM Errors in HPC Clusters. The 7th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS) @HPDC, Washington, D.C., USA, Jun, 2017.

  3. [IWOMP] Joachim Protze, Dong H. Ahn, Ignacio Laguna, Martin Schulz, and Matthias S. Muller. Testing Infrastructure for OpenMP Debugging Interface Implementations. In the International Workshop on OpenMP (IWOMP), Oct 5, 2016.

  4. [IWOMP] Joachim Protze, Ignacio Laguna, Dong H. Ahn, John DelSignore, Ariel Burton, Martin Schulz, and Matthias S. Muller. Lessons Learned from Implementing OMPD: a Debugging Interface for OpenMP. In the 11th International Workshop on OpenMP (IWOMP), Aachen, Germany, October 1-2, 2015.

  5. [LLVM-HPC] Joachim Protze, Simone Atzeni, Dong H Ahn, Martin Schulz, Ganesh Gopalakrishnan, Matthias S Muller, Ignacio Laguna, Zvonimir Rakamaric, Greg L Lee. Towards providing low-overhead data race detection for large OpenMP applications. In Workshop on LLVM Compiler Infrastructure in HPC, held in conjunction with SC’14, New Orleans, Louisiana, Nov, 2014.

  6. [ScalA] Ignacio Laguna, Edgar A Leon, Martin Schulz, Mark Stephenson. A study of application-level recovery methods for transient network faults. In Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA’13), held in conjunction with SC’13, Denver, Colorado, Nov, 2013.

  7. [SEHPCCSE] Dong H Ahn, Gregory L Lee, Ganesh Gopalakrishnan, Zvonimir Rakamaric, Martin Schulz, Ignacio Laguna. Overcoming extreme-scale reproducibility challenges through a unified, targeted, and multilevel toolset. In 1st International Workshop on Software Engineering for High Performance Computing in Computational Science and Engineering (SEHPCCSE’13), held in conjunction with SC’13, Denver, Colorado, Nov, 2013.

  8. [SELSE] Greg Bronevetsky, Ignacio Laguna, Saurabh Bagchi, Bronis R. de Supinski, Dong H. Ahn, and Martin Schulz. Statistical Fault Detection for Parallel Applications with AutomaDeD. In 6th IEEE Workshop on Silicon Errors in Logic - System Effects (SELSE’10), Stanford, CA, Mar 23-24, 2010.