Craig Ulmer

EMPIRE I/O Evaluation

2018-10-10 faodel hpc io pub

This year in my I/O project we've been part of an ASC Level 2 milestone to evaluate the readiness of the EMPIRE simulation code that's being developed at Sandia. One mode of EMPIRE uses particle-in-cell methods to simulate plasma environments under different conditions. In a nutshell, the simulation tracks hundreds of millions of particles as they move through a mesh in order to observe their electromagnetic effects. My part of this work has focused on making sure I/O is performant, and that the simulator will be able to checkpoint data (particles and mesh variables) without killing simulation performance. Since existing IO libraries were used for field data, the bulk of this work has focused on using FAODEL to route raw, intermediate state to disk and reloading it.


After a considerable amount of work, we wrote hooks to have EMPIRE write particle/field data out to FAODEL's API. Our code packed data into Lunasa data objects and then published the objects out to a pool. For simplicity we used local pools that wrote to disk, but in later tests we also wrote to distributed hash tables. We ran a large number of tests on both the KNL and Haswell partitions of our Cray XC40 platform, and compared writes to both Lustre and the DataWarp Burst Buffer. FAODEL provided I/O speedups on both types of storage because it helped streamline our I/O. Interestingly, we noticed that all I/O suffers on the KNL processors, due to the poor serial performace of these CPUs compared to Haswell.


While the milestone was a considerable amount of work for me, it was a good experience because it gave me a chance to see how complex codes grow over time. It was hard to deal with all the changes, but once we inserted an API beachhead for our code it was easier to make progress. It was thrilling to see our code actually work and do the things it was supposed to do, and we received postive feedback from the review committee for targeting the real hardware. I'll spend a good bit of next year revamping some of our code and pushing it so we can do more analysis operations.

Publications

  • SAND Report Matt Bettencourt et. al., "ASC ATDM Level 2 Milestone #6358: Assess Status of Next Generation Components and Physics Models in EMPIRE" SAND2018-10100