Recently I've been working with Carlos Maltzahn at the University of California, Santa Cruz (UCSC) on a new DOE ASCR project that is exploring how programmable network interface cards (or SmartNICs) can be used to improve data management services in HPC/HPDA platforms. To get a better idea of what current hardware can do, Jianshen Liu conducted a number of network and CPU experiments on NVIDIA BlueField-2 SmartNICs located in NSF's CloudLab. While the ARMs on these cards are slow compared to modern host processors, they are sufficient for performing the kinds of operations we routinely need in the I/O space. Jianshen summarized what we learned in the following arXiv paper (UUR SAND2021-5854R).
High-performance computing (HPC) researchers have long envisioned scenarios where application workflows could be improved through the use of programmable processing elements embedded in the network fabric. Recently, vendors have introduced programmable Smart Network Interface Cards (SmartNICs) that enable computations to be offloaded to the edge of the network. There is great interest in both the HPC and high-performance data analytics communities in understanding the roles these devices may play in the data paths of upcoming systems.
This paper focuses on characterizing both the networking and computing aspects of NVIDIA's new BlueField-2 SmartNIC when used in an Ethernet environment. For the networking evaluation we conducted multiple transfer experiments between processors located at the host, the SmartNIC, and a remote host. These tests illuminate how much processing headroom is available on the SmartNIC during transfers. For the computing evaluation we used the stress-ng benchmark to compare the BlueField-2 to other servers and place realistic bounds on the types of offload operations that are appropriate for the hardware.
Our findings from this work indicate that while the BlueField-2 provides a flexible means of processing data at the network's edge, great care must be taken to not overwhelm the hardware. While the host can easily saturate the network link, the SmartNIC's embedded processors may not have enough computing resources to sustain more than half the expected bandwidth when using kernel-space packet processing. From a computational perspective, encryption operations, memory operations under contention, and on-card IPC operations on the SmartNIC perform significantly better than the general-purpose servers used for comparisons in our experiments. Therefore, applications that mainly focus on these operations may be good candidates for offloading to the SmartNIC.
There was a lot of talk on local social media this week about a large military plane that flew over Livermore at a very low altitude. The town is already wound up about larger planes flying into Livermore because there's an expansion plan being discussed that would allow private charter 737s to land at the airport. Seeing a massive, loud military plane flying low over town made everyone wonder if that's what daily life is going to be like in the next few years.
There was a lot of speculation about what was going on with this flight. Some people thought it was emergency vaccine supplies being dropped off. Others claimed it was a military salute for a Lt. Col. in Pleasanton who was just awarded the Distinguished Flying Cross. In the end it turned out to be a C-17 Globemaster doing a practice landing approach at our municipal airport (see Brodie Brazil's video of the approach).
I went to my PiAware node and pulled the day's data to see what military flights took place on Wednesday. I found AE07E0 was active around the 4pm time period people were talking about. Interestingly, the flight used the call sign SOUND87, which made me wonder if this was some kind of sound test for the airport extension (I doubt it now- it seems like a routine test). As the below plots show, the plane flew in from the east, passed the airport, and made a sharp turn north. Zooming in on the east side of town, I noticed it flew right over LLNL at about 2K ft, just a block away from their $3.5B national ignition facility (NIF). I'd thought they had a no-fly zone over them, but that seems to only be for drones. Here is the raw data for the flight.
Debunking the Salute Theory
The idea that the military would dive bomb a city to show its appreciation of a soldier bothered me, so I went to FlightRadar24 and pulled up the data for the whole flight. As seen below, they took off from Vegas, circled the bay area, and then dropped in on Livermore. After that, they flew north to Concord and did a similar practice approach at Concord's municipal airport (CCR) before landing at Travis AFB. Given that they didn't fly anywhere near Pleasanton and they made a second drop somewhere else, I'd guess this has nothing to do with the Lt. Col's award.
Questioning the Airport Expansion
One thing this flight really highlights is that Livermore people will notice larger planes flying to our airport. While the C-17 is much bigger and louder than the 737s the expansion is targeting, it made a lot of people realize that the airport approach really does stretch all the way across town, starting at the $3.5B big science experiment at the lab. I hope enough people stand up to the FAA and prevent larger planes from being able to land there.
Last Fall we purchased some NVIDIA Ampere A100 GPU cards to get a better understanding of how much they might impact some of our data-intensive workloads. Stefan Seritan dug into the details and put together this performance evaluation report.
The performance of NVIDIA's latest A100 graphics processing unit (GPU) is benchmarked for computing and data analytic workloads relevant to Sandia's missions. The A100 is compared to previous generations of GPUs, including the V100 and K80, as well as multi-core CPUs from two generations of AMD's EPYC processors, Zen and Zen 2. Computing workloads such as sparse matrix operations (e.g. HPCG benchmark) and numerical solver-heavy applications based on Trilinos and Kokkos see a moderate 1.5x to 2x speedups compared to the V100, consistent with the increased core count and memory bandwidth of the A100. Training and inference on machine learning (ML) models such as ResNet-50 for image classification and BERT-Large for natural language processing show the same 2x speedup over the V100.
However, these ML workloads also benefit from increased tensor core capabilities in the V100 and A100 GPUs, yielding a 3.5x speedup using a mixed (single + half) precision strategy for floating point operations. While the performance gap between GPUs and CPUs remains moderate (3x to 8x) for high-performance computing applications, these new hardware features of recent GPU generations give 50x to 100x speedups in out-of-the-box ML workloads compared to CPUs. With additional A100 features still undergoing testing (INT8, structural sparsity, multi-instance GPUs) with clear applications for ML workloads, the A100 GPU seems an extremely promising hardware accelerator for artificial intelligence (AI) and data analytics research at Sandia.
Joe Kenny has worked hard over the last few years to understand the tradeoffs of building a 100GigE network fabric for cluster computers that can run regular TCP-based applications and HPC applications that use MPI and RDMA. He put a paper together for INDIS at SC this year that ties together some of simulation work he did with Jeremy Wilke and some of the practical experiments we've been doing on real hardware.
Priority-based Flow Control (PFC), RDMA over Converged Ethernet (RoCE) and Enhanced Transmission Selection (ETS) are three enhancements to Ethernet networks which allow increased performance and may make Ethernet attractive for systems supporting a diverse scientific workload. We constructed a 96-node testbed cluster with a 100 Gb/s Ethernet network configured as a tapered fat tree. Tests representing important network operating conditions were completed and we provide an analysis of these performance results. RoCE running over a PFC-enabled network was found to significantly increase performance for both bandwidth-sensitive and latency-sensitive applications when compared to TCP. Additionally, a case study of interfering applications showed that ETS can prevent starvation of network traffic for latency-sensitive applications running on congested networks. We did not encounter any notable performance limitations for our Ethernet testbed, but we found that practical disadvantages still tip the balance towards traditional HPC networks unless a system design is driven by additional external requirements.
One of the weirder news stories that came out when Trump announced he had COVID-19 was that the US's doomsday planes are now hovering, poised to send out missile launch commands to submarines. It looks like this started when someone on twitter noticed that some of the US's Boeing E-6B planes were heading out to the oceans on the east and west coasts, and that these planes are the mobile command centers for coordinating with submarines. Twitter and Fox did what they do best and went off the rails trying to figure out what this all means. Fortunately, plane spotters like Christiaan Triebert and others properly dumped flight histories to show that these flights actually happen all the time. I didn't know anything about these planes so I spent the morning reading wikipedia and looking through my data to see if I could find them. Yep! There are some in CA and they do show up all the time! Relax.
Boeing E-6 Mercury Planes
From Wikipedia, the Boeing E-6 Mercury is a variant of the Boeing 707 that was made for the military to provide communication among resources in case ground systems are wiped out. There are 16 of these planes in use, and from ADS-B.NL you can learn that their ICAO ids are AE040D-AE041C (conveniently sequential in the military ICAO range). I've been leaving my flight tracker on all the time since the outbreak so I did some greps on my recent data. Sure enough, I found some hits in yesterday's data. Digging through all my data and plugging it into pandas yielded the below breakdown of how many days each plane flew near me over the last few months.
As the above shows, I saw five different E-6 planes, with some of them being active as many as 10 days out of the month. While the tracker was up a lot of the time, there were some gaps in March, August, and September (the tracker crashed without me knowing it for a week; I powered it off for a few days when the garage was over 110 degrees; we had a few power outages during the fires).
Heading out to Sea
Looking through the tracks, there are several instances where the planes fly out to sea and circle around a lot. The following tracks are from July 26, August 13, and September 14. As these tracks show, flying out to sea is not an uncommon event.
Information is Surprise
Information theory elegantly defines "information" as a measurement of how much surprise is in the data. Things that happen all the time are not news. Unusual events are. Reporting on these "doomsday planes" without giving some background info is providing news- but the news for most people is just that the US has these planes at all. Taking a broader look at the data you find that these flights do not seem to be related to Trump's health, and that we don't need to assume the worst just yet.