Contact Info

Linked-In connections

Work Bio

Craig Ulmer is a Computer Engineer in the Visualization and Scientific Computing group at Sandia National Laboratories in Livermore, California. He is currently investigating how revolutionary hardware technologies (Flash Memory, FPGAs, and GPUs) can be utilized to change the way data-intensive applications are designed. This work is an extension of a reconfigurable computing project that Craig lead from 2003 to 2006 that utilized Field-Programmable Gate Arrays (FPGAs) as a means of accelerating computational kernels that are important in scientific applications. During his five years at Sandia, Craig has worked in a variety of application domains, including high-performance computing (HPC) architectures, wireless sensor networks, post-processing tools for data discovery, and real-time network intrusion detection.

Prior to joining Sandia, Craig received a Ph.D. in Electrical and Computer Engineering from the Georgia Institute of Technology for his work with low-level communication libraries for cluster computers. This research resulted in a flexible message layer for Myrinet named GRIM that enables users to efficiently utilize hardware accelerators and multimedia devices that are distributed throughout a cluster. While attending Georgia Tech, Craig completed a total of two years of intern and Co-Op work assignments at NASA's Jet Propulsion Laboratory, Eastman Kodak's Digital Technology Center, and IBM's EduQuest division.

Research Interests

In a survey course I took when I first started college, a professor described Computer Engineers as "tall skinny" people that need to know a little about a lot of different things (as opposed to other engineers that know a lot about a little). While the upside of this field is that you get to dive into a number of different application domains, the downside is that it can be incredibly frustrating because there is never enough time to explore all of the interesting paths. My current short list of research interests includes the following.

Storage Technologies for Large Data Problems

NAND Flash


In early 2007 we met a few guys from a startup named Fusion-io that were working on a high-performance PCIe storage product made out of NAND-Flash memory. We became early evaluators of their hardware and confirmed that a single card could outperform a stack of hard drives. What's more, we found that flash performance actually improved as workload increased. Since then I have been examining how data-intensive algorithms can be refactored to take advantage of flash memory. We've also been fortunate enough to obtain a few other pre-release flash devices from other vendors that give us a good idea of where the storage world is heading.

At a higher level, we are also digging into large data problems through data warehouse appliances. Both Netezza and XtremeData offer parallel database systems that stripe data across many disk blades. While these systems utilize brute force hardware, they employ a SQL interface that hides the parallelization task from the user. As such we are currently evaluating how well a system like Netezza performs with scientific datasets.

Computational Accelerators

The Cray XD1


Since approximately 2000, a large amount of my work has focused on utilizing special-purpose hardware devices as a means of accelerating application performance. Until recently, FPGAs were the only technology that made sense. As a reconfigurable computing researcher I built custom computational pipelines for FPGAs that were comprised of 50+ floating-point units. On a well-designed system such as the Cray XD1, we found that FPGAs could give 10x speedups over Opterons.

FPGAs ultimately lost the accelerator battle because they were difficult to program. These days the best choice for accelerator research is a GPU, or possibly an embedded multicore device (e.g., Tilera or Ambric). These chips are cheap and easy to program. However, similar to FPGAs, the main issue is finding the right way to exchange data between the hardware and the host application. In support of my large data work, I am currently investigating how data can be moved efficiently between flash memory storage devices and a Tilera board.

Network Interface Hardware/Software


My PhD work involved the design and implementation of a low-level communication library named GRIM (General-purpose Reliable In-order Messages) for cluster computers. GRIM employed a Myrinet Network Interface (NI) as a communication broker between all resources in a cluster, including its CPUs, distributed memory, and peripheral devices. This connectivity allowed us to implement complex pipelines on distributed cluster resources (e.g., generate a video feed from a capture card, filter it through one or more FPGA or CPU resources, and then display the results on a video card). GRIM featured a rich set of primitives (remote DMA, active messages, and NI-based multicast), but still managed to deliver low-latency, high-bandwidth performance.

Wireless Sensor Networks

Sensor Nets
Sensor Networks

I was first introduced to wireless sensor networks (WSNs) when my summer internship mentor at JPL asked me to work through the logistics of deploying a WSN on Mars. It was a great project because it forced me to get out and interview a number of science experts at JPL as well as dig through an amazing amount of literature. I wound up building two simulators to test out distributed clustering algorithms. More recently, I've had the opportunity to work on a few WSN research projects with actual hardware. In these projects I helped write, debug, and measure the power consumption of on-node communication software.

Disclaimer: This information is based entirely on my own views and not my employer's.
Last modified: November 19, 2008.


Well there you go, you found me. Or at least a dull web page where I talk about myself in the third person a lot. If you like the first person better, you might want to check out