Scientific Workloads on 100GigE Fabrics

2020-11-12 net pub

Joe Kenny has worked hard over the last few years to understand the tradeoffs of building a 100GigE network fabric for cluster computers that can run regular TCP-based applications and HPC applications that use MPI and RDMA. He put a paper together for INDIS at SC this year that ties together some of simulation work he did with Jeremy Wilke and some of the practical experiments we've been doing on real hardware.


Priority-based Flow Control (PFC), RDMA over Converged Ethernet (RoCE) and Enhanced Transmission Selection (ETS) are three enhancements to Ethernet networks which allow increased performance and may make Ethernet attractive for systems supporting a diverse scientific workload. We constructed a 96-node testbed cluster with a 100 Gb/s Ethernet network configured as a tapered fat tree. Tests representing important network operating conditions were completed and we provide an analysis of these performance results. RoCE running over a PFC-enabled network was found to significantly increase performance for both bandwidth-sensitive and latency-sensitive applications when compared to TCP. Additionally, a case study of interfering applications showed that ETS can prevent starvation of network traffic for latency-sensitive applications running on congested networks. We did not encounter any notable performance limitations for our Ethernet testbed, but we found that practical disadvantages still tip the balance towards traditional HPC networks unless a system design is driven by additional external requirements.


  • INDIS2020 Paper Joseph P. Kenny, Jeremiah J. Wilke, Craig D. Ulmer, Gavin M. Baker, Samuel Knight, and Jerrold A. Friesen, "An Evaluation of Ethernet Performance for Scientific Workloads". in 2020 IEEE/ACM Innovating the Network for Data-Intensive Science (INDIS).


Debunking the Doomsday Plane Hype

2020-10-02 planes

One of the weirder news stories that came out when Trump announced he had COVID-19 was that the US's doomsday planes are now hovering, poised to send out missile launch commands to submarines. It looks like this started when someone on twitter noticed that some of the US's Boeing E-6B planes were heading out to the oceans on the east and west coasts, and that these planes are the mobile command centers for coordinating with submarines. Twitter and Fox did what they do best and went off the rails trying to figure out what this all means. Fortunately, plane spotters like Christiaan Triebert and others properly dumped flight histories to show that these flights actually happen all the time. I didn't know anything about these planes so I spent the morning reading wikipedia and looking through my data to see if I could find them. Yep! There are some in CA and they do show up all the time! Relax.

Boeing E-6 Mercury Planes

From Wikipedia, the Boeing E-6 Mercury is a variant of the Boeing 707 that was made for the military to provide communication among resources in case ground systems are wiped out. There are 16 of these planes in use, and from ADS-B.NL you can learn that their ICAO ids are AE040D-AE041C (conveniently sequential in the military ICAO range). I've been leaving my flight tracker on all the time since the outbreak so I did some greps on my recent data. Sure enough, I found some hits in yesterday's data. Digging through all my data and plugging it into pandas yielded the below breakdown of how many days each plane flew near me over the last few months.

As the above shows, I saw five different E-6 planes, with some of them being active as many as 10 days out of the month. While the tracker was up a lot of the time, there were some gaps in March, August, and September (the tracker crashed without me knowing it for a week; I powered it off for a few days when the garage was over 110 degrees; we had a few power outages during the fires).

Heading out to Sea

Looking through the tracks, there are several instances where the planes fly out to sea and circle around a lot. The following tracks are from July 26, August 13, and September 14. As these tracks show, flying out to sea is not an uncommon event.

Information is Surprise

Information theory elegantly defines "information" as a measurement of how much surprise is in the data. Things that happen all the time are not news. Unusual events are. Reporting on these "doomsday planes" without giving some background info is providing news- but the news for most people is just that the US has these planes at all. Taking a broader look at the data you find that these flights do not seem to be related to Trump's health, and that we don't need to assume the worst just yet.

EBRPD and Police Helicopters

2020-09-12 planes

Looking at the fire-fighting helicopter data made me wonder what other helicopters fly around our area. I didn't know if there was an identifier that the FAA uses to distinguish planes from helicopters, but when I looked at the FAA site I noticed that they had a database field that designated whether the aircraft was for government use or not. I skimmed through the list of aircraft registered in Alameda and found the following gov planes: two sheriff's cesnas (N10CX and N5525U), three Oakland police helicopters (N220PD, N330PD, and N510PD), and two helicopters registered to the East-Bay Regional Park District (N708PD and N996PD). I didn't know the EBRPD had helicopters, but thinking about it it's not surprising. EBRPD manages parks all over the bay area, many of which are far off into the hills. Their operations page talks about how they use the helicopters to support police tasks, fight fires (they have their own water buckets), and do rescue work (I've thought about this on some of the long bike rides I've done with the kids back into Morgan Territory).

Fire Inspections

Looking through my flight data I found that the EBRPD helicopters are sent out periodically to check on the different parks. The fires from the lightning strikes the other week have been awful though, so it looks like the EBRPD helicopters have been doing more flights over the areas. The below track from September 5th shows how they go out to Chabot, Morgan Territory, Diablo, Sunol, the Pleasanton Ridge, and Brushy Peak.

Police Activity

Looking through the data some more I spotted a few of the Oakland Police Department helicopters. So far, these usually seem to fly out from Oakland to check things out along the interstate, but sometimes they seem to be circling some activity like the below (maybe something was happening at the Oakland Zoo?). One of the interesting things about all this is that while they broadcast their location over ADSB, aggregate sites like FlightRadar don't report the PD helicopter positions (I think plane owners can request that their tracks not be reported on these sites). The data is out there, you just have to know how to get it.


Fire-Fighting Helicopter in Livermore

2020-09-06 planes

Someone on the Livermore version of NextDoor posted yesterday that there was a helicopter hovering in the south part of town and that they wanted to know if anyone knew what was up. As usual the brighter minds of Livermore had the usual "government helicopter" conspiracy theories so I decided to take a look at the ADSB data I'd collected to see if there were any helicopters in it. I'd never tried looking specifically for helicopters, so I just skimmed the ID list and checked out everything that didn't match a prefix for a known airline (eg, SWA for Southwest). That turned up N404AJ, a twin-rotor Chinook operated by Billings Flying Service that's been helping put out the fires south of Livermore.


It looks like the helicopter left the area yesterday to fight the new fire near Fresno, but these two tracks I captured show how they've been using Livermore as a staging ground to fight fires south of town. They've been landing near Poppy Ridge at Meadowlark Field, which also happens to have a giant Star Wars Rebel Alliance symbol in the field next to it.

Looking around the web some more, I found some photos of the helicopter and the video at the top of this post about how BFS contracted someone to build a tank module so they can dump fire suppressant on fires. So no, tin-foil-hatters of Livermore, not a secret Government operation- just fire fighters protecting your McMansions.

Searching Publications for Proliferation Activities

2020-07-13 pub

One of the projects that I've been supporting this year is applying natural language processing techniques to a technical publication dataset to help identify activities that might be related to nuclear proliferation work. As the "data czar" for this project, I investigated multiple sources (e.g., Scopus, Semantic Scholar, OSTI) where we could get a baseline corpus of technical papers we could inspect. In the end, was the least difficult to obtain and had a lot of interesting info in it. I pulled a large chunk of it, organized it, and did the first pass data engineering to get it into a usable form (turns out there are literally hundreds of ways Sandians identify their laboratories). Jon, Danny, and Zoe did a good bit of analysis on this data, which resulted in this paper at the Institute of Nuclear Materials Management (INMM) annual meeting.


Scientific and technical publications can provide relevant information regarding the technical capabilities of a state, the location of nuclear materials and related research activities within that state, and international partnerships and collaborations. Nuclear proliferation analysts monitor scientific and technical publications using complex word searches defined by fuel cycle experts as part of their collection and analysis of all potentially relevant information. These search strings have been refined over time by fuel cycle experts and other analysts but represent a top-down approach that is inherently defined by the requirement of term presence. In contrast, we are developing a bottom-up approach in which we develop topic models from a small number of expert refereed source documents to search similar topic space, with the hope that we can use this method to identify publications that are relevant to the proliferation detection problems space without necessarily conforming to the expert-derived rule base. We are comparing our results of various topic modeling and clustering techniques to a traditional analyst search strings to determine how well our methods work to find seed documents. We also present how our methods provide added benefit over traditional search by organizing the retrieved documents into topic-oriented clusters. Finally, we present distributions of author institutions to facilitate a broader perspective of the content of interest for analysts.


  • INMM Paper Jonathan Bisila, Daniel M. Dunlavy, Zoe N. Gastelum, and Craig D. Ulmer, "Topic Modeling with Natural Language Processing for Identification of Nuclear Proliferation-Relevant Scientific and Technical Publications", in Proceedings of the Institute of Nuclear Materials Management (INMM) 61st Annual Meeting, July 12, 2020.


  • INMM Slides Presentation that Jon gave virtually at INMM.