SmartNICs Project Final Report
Our DOE ASCR-funded "Offloading Data Management Services to SmartNICS" project published this 144-page unclassified unlimited release (UUR) technical report.
Abstract
Modern workflows for high-performance computing (HPC) platforms rely on data management and storage services (DMSSes) to migrate data between simulations, analysis tools, and storage systems. While DMSSes help researchers assemble complex pipelines from disjoint tools, they currently consume resources that ultimately increase the workflow's overall node count. In FY21-23 the DOE ASCR project "Offloading Data Management Services to SmartNICs" explored a new architectural option for addressing this problem: hosting services in programmable network interface cards (SmartNICs). This report summarizes our work in characterizing the NVIDIA BlueField-2 SmartNIC and defining a general environment for hosting services in compute-node SmartNICs that leverages Apache Arrow for data processing and Sandia's Faodel for communication. We discuss five different aspects of SmartNIC use. Performance experiments with Sandia's Glinda cluster indicate that while SmartNIC processors are an order of magnitude slower than servers, they offer an economical and power efficient alternative for hosting services.
Publication
- SAND Report Craig Ulmer, Jianshen Liu, Carlos Maltzahn, Aldrin Montana, Matthew L. Curry, Scott Levy, Whit Schonbein, and John Shawger, "Offloading Data Management Services to SmartNICs: Project Summary". SAND2024-03873, April 2024.
Presentations
- HPC Initiatives Slides: Presentation I gave at the SNL HPC Initiatives seminar in January.
- SRU Slides: Presentation I gave to an undergraduate Computer Engineering class at Slippery Rock University in October.