Looking at the fire-fighting helicopter data made me wonder what other helicopters fly around our area. I didn't know if there was an identifier that the FAA uses to distinguish planes from helicopters, but when I looked at the FAA site I noticed that they had a database field that designated whether the aircraft was for government use or not. I skimmed through the list of aircraft registered in Alameda and found the following gov planes: two sheriff's cesnas (N10CX and N5525U), three Oakland police helicopters (N220PD, N330PD, and N510PD), and two helicopters registered to the East-Bay Regional Park District (N708PD and N996PD). I didn't know the EBRPD had helicopters, but thinking about it it's not surprising. EBRPD manages parks all over the bay area, many of which are far off into the hills. Their operations page talks about how they use the helicopters to support police tasks, fight fires (they have their own water buckets), and do rescue work (I've thought about this on some of the long bike rides I've done with the kids back into Morgan Territory).
Looking through my flight data I found that the EBRPD helicopters are sent out periodically to check on the different parks. The fires from the lightning strikes the other week have been awful though, so it looks like the EBRPD helicopters have been doing more flights over the areas. The below track from September 5th shows how they go out to Chabot, Morgan Territory, Diablo, Sunol, the Pleasanton Ridge, and Brushy Peak.
Looking through the data some more I spotted a few of the Oakland Police Department helicopters. So far, these usually seem to fly out from Oakland to check things out along the interstate, but sometimes they seem to be circling some activity like the below (maybe something was happening at the Oakland Zoo?). One of the interesting things about all this is that while they broadcast their location over ADSB, aggregate sites like FlightRadar don't report the PD helicopter positions (I think plane owners can request that their tracks not be reported on these sites). The data is out there, you just have to know how to get it.
Someone on the Livermore version of NextDoor posted yesterday that there was a helicopter hovering in the south part of town and that they wanted to know if anyone knew what was up. As usual the brighter minds of Livermore had the usual "government helicopter" conspiracy theories so I decided to take a look at the ADSB data I'd collected to see if there were any helicopters in it. I'd never tried looking specifically for helicopters, so I just skimmed the ID list and checked out everything that didn't match a prefix for a known airline (eg, SWA for Southwest). That turned up N404AJ, a twin-rotor Chinook operated by Billings Flying Service that's been helping put out the fires south of Livermore.
It looks like the helicopter left the area yesterday to fight the new fire near Fresno, but these two tracks I captured show how they've been using Livermore as a staging ground to fight fires south of town. They've been landing near Poppy Ridge at Meadowlark Field, which also happens to have a giant Star Wars Rebel Alliance symbol in the field next to it.
Looking around the web some more, I found some photos of the helicopter and the video at the top of this post about how BFS contracted someone to build a tank module so they can dump fire suppressant on fires. So no, tin-foil-hatters of Livermore, not a secret Government operation- just fire fighters protecting your McMansions.
One of the projects that I've been supporting this year is applying natural language processing techniques to a technical publication dataset to help identify activities that might be related to nuclear proliferation work. As the "data czar" for this project, I investigated multiple sources (e.g., Scopus, Semantic Scholar, OSTI) where we could get a baseline corpus of technical papers we could inspect. In the end, OSTI.gov was the least difficult to obtain and had a lot of interesting info in it. I pulled a large chunk of it, organized it, and did the first pass data engineering to get it into a usable form (turns out there are literally hundreds of ways Sandians identify their laboratories). Jon, Danny, and Zoe did a good bit of analysis on this data, which resulted in this paper at the Institute of Nuclear Materials Management (INMM) annual meeting.
Scientific and technical publications can provide relevant information regarding the technical capabilities of a state, the location of nuclear materials and related research activities within that state, and international partnerships and collaborations. Nuclear proliferation analysts monitor scientific and technical publications using complex word searches defined by fuel cycle experts as part of their collection and analysis of all potentially relevant information. These search strings have been refined over time by fuel cycle experts and other analysts but represent a top-down approach that is inherently defined by the requirement of term presence. In contrast, we are developing a bottom-up approach in which we develop topic models from a small number of expert refereed source documents to search similar topic space, with the hope that we can use this method to identify publications that are relevant to the proliferation detection problems space without necessarily conforming to the expert-derived rule base. We are comparing our results of various topic modeling and clustering techniques to a traditional analyst search strings to determine how well our methods work to find seed documents. We also present how our methods provide added benefit over traditional search by organizing the retrieved documents into topic-oriented clusters. Finally, we present distributions of author institutions to facilitate a broader perspective of the content of interest for analysts.
Recently, I needed to dig up some references to papers I either read or wrote back in the 1990s about programmable network interface cards. Out of curiosity, I did a search for my own dissertation on the web to see if it's floating around somewhere, 17 years after I published it. It didn't surprise me that some of the slide-scraping sites had copies of my defense presentation slides (I made these available on my Ga Tech website). However, I was surprised to find two sites that claimed to have an electronic copy of the actual dissertation since I had never made it available. As it turns out, the Georgia Tech library scanned in the paper version a few years after I graduated (cool!). The other place was some scraper site in China (maybe not so surprising).
The GT Library webpage said they didn't have permission to share the dissertation with people outside GT, so I contacted them and submitted the paperwork to make it world readable. The pdf download was disappointing though- it was 40MB in size (!) and had scanner burn on several of the pages. It occured to me that I could generate a better version, resurrected from my old files. That snowballed into a lot more work than I wanted, but I finally finished it and have added it to this website. Wading through it has given me an opportunity to reflect on what I wrote.
Converting to LaTeX
Resurrecting my dissertation was an absolute chore. At the time, my advisor was curious as to whether modern WYSIWYG editors were solid enough for a dissertation, so he suggested that we buck the time-honored trend of usng LaTeX and have me write it in MS Word 2000. It seemed like a valid, harmless decision when I started, but by the end of the writing it was a constant battle to get the document done before word corrupted it in some unfixable way. To this day, I still have a fear that I'll open a word document and all my section headers will have a mysterious "Char Char Char" phrase prepended to the section title. It was handy to be able to use Power Point and Excel to do my figures and plots, though. Plus, my advisor did periodically use the Track Changes feature to get me comments and corrections. It just would have been nicer to have something in between Word (hard to precisely control) and LaTeX (hard to view while writing).
For the conversion proces, I loaded each chapter into Libre Office and then exported to either text or LaTeX depending on how complicated the text was (the LaTeX output always seemed to spew a lot of extra junk that needed to be filtered out). GT had a standard thesis/dissertation template available that did most of the document boilerplate work for me. The hard part about this process was writing a bunch of one-off awk/grep scripts to correct all the formatting mistakes that happened during export. Importing all the figures was nother problem, but I found the modern version of word let me save my Power Points/plots to pdf, which I could then trim with Linux tools. Done. The last chore was proofreading the text and fixing the bibliography. 17 years is a long time for references to stay valid and many of the product white papers simply disappeared. In the end I think I produced a pretty decent spin of my dissertation that's only 1.6MB in size. I've added a post with the dissertation back on 11/19/2002 when it happened.
Better Material than Expected
I'll admit that when I started reading my dissertation I had low expectations about the content. While I put a lot of work into my research topic, I've always felt like it was a 5% research / 95% development effort. Everyone that starts grad school thinks they'll hit some keen idea that will come up with a new way to do things that will beat quicksort, get around the Nyquist sampling rate limits (compressed sensing kinda did!), or cure cancer. Over time, most people realize that the idea tree was picked clean by the 1960's, and that most of what we've been doing since then is reacting to improvements in technology. Still, there's a lot of snobbery among researchers that if you're not writing a lot of theorems, lemmas, and QEDs in your papers, you're not doing research. My dissertation had zero proofs so I've always felt like I messed up somewhere.
Reading the text again though, I realized I explored a lot of ideas that people hadn't dug into much at the time, and that some of those ideas were things that have only become important to others in the last decade. My thesis was about how you could design a message layer that ran on a programmable NIC and managed all the gritty details about communication so that both host CPUs and peripheral devices could access the network. My word did all the things other people did at the time (low-latency, high-bandwidth messages between hosts, RDMAs to physical and virtual memory, network-interface based multicast!), plus it let you steer data to multimedia cards (video capture/display, FPGA accelerators, and storage cards). In retrospect, this kind of thing became a lot more important 5 years later when people needed a way to route data between GPUs, or more recently when vendors returned to building Smart NICs so people could embed operations in the fabric. While my dissertation had zero impact on any of this, it at least feels good to look back on it and see that I was on the right track.
The main negatives I had about my dissertation was that it was simply too long and filled with details that nobody would care about. After five years of Ph.D. work, I had a chip on my shoulder and wanted to write about every single aspect of what I had done, no matter how boring it was. I understand now that conciseness is the key to good writing, and that giant chunks of text could have been moved to an appendix or dropped entirely. When profs commented about how much text there was, I remember telling them I wanted it there so I'd have it for myself to read later. Well, grad-school-Craig, mid-career-Craig wants you to know he appreciates the sentiment, but he doesn't want to read all of that either. As they say out here in future land, ain't nobody got time for that.
Reading my dissertation reminded me of all the great conversations I had with my advisor, Sudha Yalamanchili, during those years (and in later work visits). Last year Sudha passed away after a long, quiet fight with cancer. GT was not exactly a friendly school, but Sudha always had an optimism to him that made me want to stay longer and try out new ideas. While I made grad school go on longer than it should have, I'm proud of the work I did with this dissertation and am glad that I had Sudha to guide me through the whole process.
Scott's did a lot of work over the last year collecting stats about how Lunasa (FAODEL's memory management system) can be used to improve performance in different types of communication scenarios. It turns out there are a lot of dirty secrets hidden in NIC device drivers, like simply de-registering memory can be pretty significant. Scott put a paper together making the case for using explicit memory handles when dealing with network data, instead of letting the communication layer take care of everything. There's a good Kokkos use case in the paper that gives an idea about how HPC is evolving, and he has numbers for both Mutrino (Cray/Gemini) and Stria (ARM/InfiniBand).
Remote Direct Memory Access (RDMA) is an increasingly important technology in high-performance computing (HPC). RDMA provides low-latency, high-bandwidth data transfer between compute nodes. Additionally, it does not require explicit synchronization with the destination processor. Eliminating unnecessary synchronization can significantly improve the communication performance of large-scale scientific codes. A long-standing challenge presented by RDMA communication is mitigating the cost of registering memory with the network interface controller (NIC). Reusing memory once it is registered has been shown to significantly reduce the cost of RDMA communication. However, existing approaches for reusing memory rely on implicit memory semantics. In this paper, we introduce an approach that makes memory reuse semantics explicit by exposing a separate allocator for registered memory. The data and analysis in this paper yield the following contributions: (i) managing registered memory explicitly enables efficient reuse of registered memory; (ii) registering large memory regions to amortize the registration cost over multiple user requests can significantly reduce cost of acquiring new registered memory; and (iii) reducing the cost of acquiring registered memory can significantly improve the performance of RDMA communication. Reusing registered memory is key to high-performance RDMA communication. By making reuse semantics explicit, our approach has the potential to improve RDMA performance by making it significantly easier for programmers to efficiently reuse registered memory.