Craig Ulmer is a Computer Engineer in the
Visualization and Scientific Computing group at
Sandia National Laboratories
in Livermore, California. He is currently investigating how revolutionary hardware technologies (Flash Memory,
FPGAs, and GPUs) can be utilized to change the way data-intensive applications are designed. This work is
an extension of a
project that Craig lead from 2003 to 2006 that utilized Field-Programmable Gate Arrays (FPGAs) as a
means of accelerating computational kernels that are important in scientific applications. During his five
years at Sandia, Craig has worked in a variety of application domains, including high-performance
computing (HPC) architectures, wireless sensor networks, post-processing tools for
and real-time network intrusion detection.
Prior to joining Sandia, Craig received a Ph.D. in Electrical and Computer Engineering from the
Georgia Institute of Technology
for his work with low-level communication libraries for cluster computers. This research resulted in a flexible message layer for
named GRIM that enables users to efficiently utilize hardware accelerators and multimedia devices that are distributed throughout a cluster. While
attending Georgia Tech, Craig completed a total of two years of intern and Co-Op work assignments at NASA's
Jet Propulsion Laboratory,
Digital Technology Center, and
In a survey course I took when I first started college, a professor described Computer Engineers as "tall skinny" people
that need to know a little about a lot of different things (as opposed to other engineers that know a lot
about a little). While the upside of this field is that you get to dive into a number of different application domains,
the downside is that it can be incredibly frustrating because there is never enough time to explore all of the
interesting paths. My current short list of research interests includes the following.
Storage Technologies for Large Data Problems
In early 2007 we met a few guys from a startup named Fusion-io that were working
on a high-performance PCIe storage product made out of NAND-Flash memory. We became early evaluators of their
hardware and confirmed that a single card could outperform a stack of hard drives. What's more, we found that
flash performance actually improved as workload increased. Since then I have been examining how data-intensive
algorithms can be refactored to take advantage of flash memory. We've also been fortunate enough to obtain
a few other pre-release flash devices from other vendors that give us a good idea of where the storage world is
At a higher level, we are also digging into large data problems through data warehouse appliances. Both
offer parallel database systems that stripe data across many disk blades. While these systems utilize brute force hardware,
they employ a SQL interface that hides the parallelization task from the user. As such we are currently
evaluating how well a system like Netezza performs with scientific datasets.
The Cray XD1
Since approximately 2000, a large amount of my work has focused on utilizing special-purpose hardware devices as
a means of accelerating application performance. Until recently, FPGAs were the only technology that made sense.
As a reconfigurable computing researcher I built custom computational pipelines for FPGAs that were comprised of 50+ floating-point units.
On a well-designed system such as the
Cray XD1, we found that FPGAs could give 10x speedups over Opterons.
FPGAs ultimately lost the accelerator battle because they were difficult to program. These days the best
choice for accelerator research is a GPU, or possibly an embedded multicore device
(e.g., Tilera or Ambric). These chips are cheap
and easy to program. However, similar to FPGAs, the main issue is finding the right way to exchange
data between the hardware and the host application. In support of my large data work, I am currently investigating
how data can be moved efficiently between flash memory storage devices and a Tilera board.
Network Interface Hardware/Software
My PhD work involved the design and implementation of a low-level communication library named GRIM (General-purpose Reliable In-order Messages)
for cluster computers. GRIM employed a Myrinet
Network Interface (NI) as a communication broker between all resources in a cluster, including its CPUs, distributed memory,
and peripheral devices. This connectivity allowed us to implement complex pipelines on distributed cluster resources
(e.g., generate a video feed from a capture card, filter it through one or more FPGA or CPU resources, and then display the
results on a video card). GRIM featured a rich set of primitives (remote DMA, active
messages, and NI-based multicast), but still managed to deliver low-latency, high-bandwidth performance.
Wireless Sensor Networks
I was first introduced to wireless sensor networks (WSNs) when my summer internship mentor at JPL asked me to
work through the logistics of deploying a WSN on Mars. It was a great project because it forced me to get out and interview
a number of science experts at JPL as well as dig through an amazing amount of literature. I wound up
building two simulators to test out distributed clustering algorithms. More recently, I've had the
opportunity to work on a few WSN research projects with actual hardware.
In these projects I helped write, debug, and measure the power consumption of on-node communication software.