I am happy to report that we (finally) shipped the "Fluid" release of FAODEL on Github this week. Looking back now, it's been over two years since our last external release. We had intended to make the updates available last year after our big milestone project with SPARC completed, but we ran into a number of chicken-and-the-egg problems with other software libraries (eg, we can't test our release until you release updates that use our release). It also didn't help that we added a few new featured while waiting on the release logistics to work themselves out. Continually adding "just one more thing" winds up resetting the release process a lot, which is lengthy when you have to test against multiple platforms, multiple nics, and multiple software stacks. Todd Kordenbrock did an excellent job of working out all the painful details and finalizing everything. Thanks, Todd!
Highlights of New Features
I'm really pleased with some of the new features in this release:
- Tracing/Playback: We added some tracing and playback tools to FAODEL that allow you to capture the list of operations a user is performing with a pool so that you can play them back at a later point in time. We found this to be very useful in our SPARC milestone because it lets you capture an application's behavior and then re-run it with different conditions, without having to use to the application. Given that sims are always bulky and difficult to tweak, this feature is really handy for architecture studies. We found the playback option was also really useful for scenarios where we needed to setup pools in a specific way.
- Kelpie Blast: There's a new option in the faodel command that blasts data at a pool in a way that's similar to simulations. This tools is handy for seeing what the impact of different pool configurations will be on write-focused operations.
- faodel-stress: Taking inspiration from stress-ng we created a new tool called faodel-stress that measures how quickly a local compute node can perform non-network tasks that are relevant to data management services (eg, key sorting, hash maps, serialization). We've used these tests to compare the processors in different platforms and have found several deficiencies that we'll be reporting on soon.
- RoCE: We've added support to work with RDMA over Converged Ethernet (RoCE). While we've typically focused on HPC platforms, we're seeing a number of HPDA systems with Ethernet start to use RoCE to overcome TCP overheads. Adding support for RoCE allows us to run on our own Carnac platform and do experiments on some BlueField platforms in the clouds.
- User-Defined Functions: We added a new, experimental API for performing user-defined functions in pool nodes. Functions are static and must be registered when the pool nodes start, but the interface provides a simple way for users to query a row and generate a single reply. This capability has been on our todo list since the beginning. This API will almost certainly change in the future, but I'm excited to think about how we can use this to make the pool nodes more active.
The following are the more terse release notes from NEWS.md:
- Kelpie: Added Drop and RowInfo operations for remote use
- Kelpie: Added ResultCollector to simplify async requests
- Kelpie: New trace pool for client-side pool activities
- Lunasa: Added DataObjectPacker for easier packing
- Lunasa: Added StringObject for easier packing of text
- NNTI: Added support for RoCE
- faodel-cli: config-info dumps out configuration options
- faodel-cli: New playback option that can use traces
- faodel-cli: New kblast option generates parallel kelpie traffic
- faodel-stress: New tool for benchmarking CPU for non-net activities
- OpBox: Ability to capture timing traces for ops
- Backburner: New notification methods to decrease CPU usage
Significant User-Visible Changes:
- Kelpie: kv_row/col_info_t replaced by object_info_t (smaller, simpler)
- config: "ioms" are now "kelpie.ioms" config: backburner.notification_method for pipe, polling, sleep_polling
- Examples can now be built inside the build via BUILD_EXAMPLES
- OpBoxStandard is now OpBoxDeprecatedStandard
- kelpie 'Compute' allows server to perform computations on objects via UDFs