For the last three years I've been involved in one of two LDRD projects that are investigating how FPGAs could be leveraged as computational accelerators in HPC platforms. Keith Underwood and Scott Hemmert's LDRD project looked at building streamlined floating-point designs that would allow the labs to place custom compute flows in an FPGA. My LDRD project looked at integration issues that arise from offloading computations into the FPGA (eg, how do you exchange data between the host and FPGA? Where's the right place to place the FPGA in the architecture?). The work was interesting for me because the problem space was broad. I had to work with applications people to find and build kernels to offload from the application. I investigated new platforms such as the Cray XD1 that offered tight coupling between the host CPU and accelerator. I built network interfaces for FPGAs using their new multi-gigabit transceivers so we could place FPGAs in the network. Overall it was a lot of hard work with a low-level hardware, but we covered a lot of material in just a short time. At the end of the LDRDs, Keith and I decided to write a joint technical report about all the different topics we explored.
Field programmable gate arrays (FPGAs) have been used as alternative computational devices for over a decade; however, they have not been used for traditional scientific computing due to their perceived lack of floating-point performance. In recent years, there has been a surge of interest in alternatives to traditional microprocessors for high performance computing. Sandia National Labs began two projects to determine whether FPGAs would be a suitable alternative to microprocessors for high performance scientific computing and, if so, how they should be integrated into the system. We present results that indicate that FPGAs could have a significant impact on future systems. FPGAs have the potential to have order of magnitude levels of performance wins on several key algorithms; however, there are serious questions as to whether the system integration challenge can be met. Furthermore, there remain challenges in FPGA programming and system level reliability when using FPGA devices.