Craig Ulmer

Opportunistic Query Execution on SmartNICs

2023-09-26 pub hpc smartnics arrow

In our SmartNIC project we've been using Apache Arrow to represent and process in-transit data that flows between different jobs in a workflow. One of the advantages of using Arrow is that it includes a sophisticated compute engine named Acero that allows you to execute queries on tabular data. Previously we've written some basic queries in C++ to have Acero split entries in a table based on a field. Lately we've been using Acero to execute queries that a user might create at runtime (via tools like DuckDB or Ibis that can generate Substrait query plans). Jianshen and I wrote some client/server code for Faodel that allows a client to transmit a serialized substrait plan to an endpoint, deserialize the requested objects into Arrow tables, apply the plan to the data, and send the serialized results back to the client. This conduit gives us a handy way to query a remote SmartNIC and inspect its in-transit data.


For this paper (and his dissertation), Jianshen focused on making a decision engine that could quickly estimate whether it would be faster to execute the query at the SmartNIC or simply return the raw data and defer execution to the client. He measured overheads for executing queries and transmitting data, and then used machine learning techniques to make predictions about how long a query would take and how much data it would return. He used Apache DataSketches to rapidly characterize the in-transit data the SmartNIC held. At runtime the decision engine parsed the query syntax and applied probabilities to each clause to estimate how selective a query would ultimately be.


Abstract

High-performance computing (HPC) systems researchers have proposed using current, programmable network interface cards (or SmartNICs) to offload data management services that would otherwise consume host processor cycles in a platform. While this work has successfully mapped data pipelines to a collection of SmartNICs, users require a flexible means of inspecting in-transit data to assess the live state of the system. In this paper, we explore SmartNIC-driven opportunistic query execution, i.e., enabling the SmartNIC to make a decision about whether to execute a query operation locally (i.e., "offload") or defer execution to the client (i.e., "push-back"). Characterizations of different parts of the end-to-end query path allow the decision engine to make complexity predictions that would not be feasible by the client alone.

Publication

  • HPEC Paper Jianshen Liu, Carlos Maltzahn, and Craig Ulmer, "Opportunistic Query Execution on SmartNICs for Analyzing In-Transit Data" in IEEE High Performance Extreme Computing, September 2023.