Data Warehouse Appliances for Mesh Analysis

2010-01-07 io mesh pub

In the Storage Intensive Computing Architectures for In-situ Data Analysis (SICAIDA), we looked at multiple ways to use data warehouse applicances to perform mesh-based analytic. We ported mesh analytics to Netezza (Mustang and TwinFin), XtremeData dbX (1008 and 1017), LexisNexis (DAS-20 and DAS-60), and Hadoop (Local and Amazon Cloud). In the end, Hadoop gave us the most flexibility at the lowest cost. We wrote a paper about it for HICs, and then reported on a broader study in the SAND2010-7471 technical report.

Abstract

As scientific computing users migrate to petaflop platforms that promise to generate multi-terabyte datasets, there is a growing need in the community to be able to embed sophisticated data analysis algorithms in the storage systems for the computing platforms. Data Warehouse Appliances (DWAs) are an attractive option for this work, due to their ability to process massive datasets efficiently. While DWAs have been proven effective in data mining and informatics applications, there are relatively few examples of how DWAs can be integrated into the scientific computing workflow. In this paper we present our experiences in adapting two mesh analysis algorithms to function on two different DWAs: a SQL-based Netezza database appliance and a Map/Reduce-based Hadoop cluster. The main contribution of this work is insight into the differences between the two platforms' programming environments. In addition, we present performance measurements for entry-level DWAs to help provide a first-order comparison of the hardware.

Publications

HICS Paper Craig Ulmer, Greg Bayer, Yung Ryn Choe, and Diana Roe, "Exploring Data Warehouse Appliances for Mesh Analysis Applications", Hawaii International Conference on System Sciences 2010.
SAND Report Craig Ulmer, Greg Bayer, Yung Ryn Choe, and Diana Roe, "Scientific Data Analysis on Data-Parallel Platforms", Sandia Report SAND2010-7471.