Craig Ulmer

Processing Particle Data Flows with SmartNICs

2022-09-23 pub smartnic hpc

In my SmartNICs project I've been working with US Santa Cruz on new software that makes it easier to process particle data streams as they flow through the network. We've been using Apache Arrow as a way to do a lot of the heavy lifting, because Arrow provides an easy-to-use tabular data representation and has excellent serialization, query, and compute functions. For this HPEC paper we converted three particle datasets to an Arrow representation and then measured how quickly Arrow could split data into smaller tables for a log-structured merge (LSM) tree implementation we're developing. Jianshen then dug into getting the BlueField-2's compression hardware to accelerate the unpacking/packing of data with a library he developed named Bitar. After HPEC we wrote an extended version of this paper for ArXiv that includes some additional plots that had previously been cut due to page limits.


The datasets for this were pretty fun. I pulled and converted particle data from CERN's TrackML Particle Identification challenge, airplane positions from the Opensky Network, and ship positions from NOAA and Marine Cadastre. One of the benefits of working with Arrow is that it let us use existing tools to do a lot of the data. I just used Pandas to read the initial data, restructure it, and save it out to compact parquet files that our tests could quickly load at runtime. Even though each dataset had a varying number of columns, our Arrow code could process each one so long as the position and ID columns had the proper labels.

Abstract

Many distributed applications implement complex data flows and need a flexible mechanism for routing data between producers and consumers. Recent advances in programmable network interface cards, or SmartNICs, represent an opportunity to offload data-flow tasks into the network fabric, thereby freeing the hosts to perform other work. System architects in this space face multiple questions about the best way to leverage SmartNICs as processing elements in data flows. In this paper, we advocate the use of Apache Arrow as a foundation for implementing data- flow tasks on SmartNICs. We report on our experiences adapting a partitioning algorithm for particle data to Apache Arrow and measure the on-card processing performance for the BlueField-2 SmartNIC. Our experiments confirm that the BlueField-2's (de)compression hardware can have a significant impact on in- transit workflows where data must be unpacked, processed, and repacked.

Publication


Employee Recognition Award for Globus Work

2022-05-17 networks

I won an an individual Employee Recognition Award (ERA) for some work that I've been doing with Globus. At the award ceremony today I got to shake hands with the lab president and several VPs. Here's the ceremonial coin they gave me:


Globus

For the last few years I've worked as a "data czar" in a few, large multi-lab research projects. One of the problems with being the data czar is that you often need to find a way to move large amounts of data between partner labs. While some of the open science labs have good mechanisms for exchanging data, most labs have strict access controls on data egress/ingress via the Internet. As a result, it isn't uncommon for researchers just bring hard drives with them when they have in-person meetings, so they can physically hand datasets over to their colleagues. It's a clunky way to collaborate.

About 20 years ago, Grid Computing people solved this problem by setting up special data transfer nodes (DTNs) at the edge of the network that have special software for maximizing throughput on large (many TBs) transfers. They eventually spun the technology off as a company named Globus. Globus acts as a third party that users could access to coordinate transfers between different DTNs. While the free tier of sevice is sufficient for basic use, Globus makes money by selling enterprise subscriptions that have features that most users would want (encryption, throttling, extra security, etc).

Sandia didn't have a Globus DTN, but our networking people were interested in seeing how well it could leverage our new 100Gbps site infrastructure. With the help of a few smart people, we worked through the lengthy approval process, stood up a DTN with 70TB of storage, and worked through more approval processes to connect to a few trial labs. While the initial transfers across the US were much lower than the raw link speed due to out HDDs, we were able to pull a few TBs of data in no time. Shortly after we let people know of our success, we heard from other projects where they needed to transfer TBs of non-sensitive data to other locations. I didn't know about the ERA submission until it was already in flight- I would have asked for it to be done as a team award, since other people did the parts that were tough.


Pattern-of-Life Activity Recognition in Seismic Data

2022-04-15 pub data seismic

A collaborator in one of my research projects included me on a clustering paper that he had accepted in Applied Artificial Intelligence. Erick was interested in developing new algorithms that could help characterize different activities taking place in seismic data. As a data engineer in the project, I gathered data and did a lot of tedious, manual inspection to extract ground truth the team could use to train their algorithms. You can tell I had a hand in making labels, given the descriptive category "workers unroll white thing".


Abstract

Pattern-of-life analysis models the observable activities associated with a particular entity or location over time. Automatically finding and separating these activities from noise and other background activity presents a technical challenge for a variety of data types and sources. This paper investigates a framework for finding and separating a variety of vehicle activities recorded using seismic sensors situated around a construction site. Our approach breaks the seismic waveform into segments, preprocesses them, and extracts features from each. We then apply feature scaling and dimensionality reduction algorithms before clustering and visualizing the data. Results suggest that the approach effectively separates the use of certain vehicle types and reveals interesting distributions in the data. Our reliance on unsupervised machine learning algorithms suggests that the approach can generalize to other data sources and monitoring contexts. We conclude by discussing limitations and future work.

Publication


SmartNICs for Data Management in HPC

2021-10-12 smartnic faodel hpc

Carlos Maltzahn invited me to give a talk about SmartNICs at the annual UCSC CROSS Research Symposium. I put the below talk togethers to cover some of our work in moving data with the BlueField SmartNICs. One of the things we have found recently is that while the current SmartNIC ARM processors are slower than x86_64 host processors, they can perform some data management tasks just as fast as other alternative processors being used in HPC (eg, KNL or ThunderX2 ARM). Here are the slides (UUR SAND2021-12527 PE).


Presentation


FAODEL 1.2108.1 Released

2021-10-08 faodel code

I am happy to report that we (finally) shipped the "Fluid" release of FAODEL on Github this week. Looking back now, it's been over two years since our last external release. We had intended to make the updates available last year after our big milestone project with SPARC completed, but we ran into a number of chicken-and-the-egg problems with other software libraries (eg, we can't test our release until you release updates that use our release). It also didn't help that we added a few new featured while waiting on the release logistics to work themselves out. Continually adding "just one more thing" winds up resetting the release process a lot, which is lengthy when you have to test against multiple platforms, multiple nics, and multiple software stacks. Todd Kordenbrock did an excellent job of working out all the painful details and finalizing everything. Thanks, Todd!


Highlights of New Features

I'm really pleased with some of the new features in this release:

  • Tracing/Playback: We added some tracing and playback tools to FAODEL that allow you to capture the list of operations a user is performing with a pool so that you can play them back at a later point in time. We found this to be very useful in our SPARC milestone because it lets you capture an application's behavior and then re-run it with different conditions, without having to use to the application. Given that sims are always bulky and difficult to tweak, this feature is really handy for architecture studies. We found the playback option was also really useful for scenarios where we needed to setup pools in a specific way.
  • Kelpie Blast: There's a new option in the faodel command that blasts data at a pool in a way that's similar to simulations. This tools is handy for seeing what the impact of different pool configurations will be on write-focused operations.
  • faodel-stress: Taking inspiration from stress-ng we created a new tool called faodel-stress that measures how quickly a local compute node can perform non-network tasks that are relevant to data management services (eg, key sorting, hash maps, serialization). We've used these tests to compare the processors in different platforms and have found several deficiencies that we'll be reporting on soon.
  • RoCE: We've added support to work with RDMA over Converged Ethernet (RoCE). While we've typically focused on HPC platforms, we're seeing a number of HPDA systems with Ethernet start to use RoCE to overcome TCP overheads. Adding support for RoCE allows us to run on our own Carnac platform and do experiments on some BlueField platforms in the clouds.
  • User-Defined Functions: We added a new, experimental API for performing user-defined functions in pool nodes. Functions are static and must be registered when the pool nodes start, but the interface provides a simple way for users to query a row and generate a single reply. This capability has been on our todo list since the beginning. This API will almost certainly change in the future, but I'm excited to think about how we can use this to make the pool nodes more active.

The following are the more terse release notes from NEWS.md:

Release Improvements

  • Kelpie: Added Drop and RowInfo operations for remote use
  • Kelpie: Added ResultCollector to simplify async requests
  • Kelpie: New trace pool for client-side pool activities
  • Lunasa: Added DataObjectPacker for easier packing
  • Lunasa: Added StringObject for easier packing of text
  • NNTI: Added support for RoCE
  • faodel-cli: config-info dumps out configuration options
  • faodel-cli: New playback option that can use traces
  • faodel-cli: New kblast option generates parallel kelpie traffic
  • faodel-stress: New tool for benchmarking CPU for non-net activities
  • OpBox: Ability to capture timing traces for ops
  • Backburner: New notification methods to decrease CPU usage

Significant User-Visible Changes:

  • Kelpie: kv_row/col_info_t replaced by object_info_t (smaller, simpler)
  • config: "ioms" are now "kelpie.ioms" config: backburner.notification_method for pipe, polling, sleep_polling
  • Examples can now be built inside the build via BUILD_EXAMPLES
  • OpBoxStandard is now OpBoxDeprecatedStandard

Experimental Features

  • kelpie 'Compute' allows server to perform computations on objects via UDFs