One of my projects this year focused on designing an institutional computing platform that could support the research needs of multiple Emulytics communities at Sandia. Emulytics is a technique that security researchers use to evaluate how software/firmware behaves when you run at enterprise network and even Internet scales. Researchers typically launch tens of thousands of VMs on a collection of bare metal compute nodes and then establish virtual networks to connect the VMs in a way that models real-life scenarios. Emulytics researchers have used this approach to answer a variety of questions, ranging from "What will this malware do when it starts to detonate?" to "Will our corporate video conference system crash again when we host an all-hands talk about cutting benefits?".
Architecting an instutitional platform for Emulytics was interesting because we had to take both technical and political issues into consideration. After the initial success of some grass roots efforts, multiple groups stood up their own platforms and developed software that solved the needs of their specific customers. We had to interview the different communities to learn what they needed and work through how they would share a larger platform with other research groups. I believe it helped that some of us were outsiders to this work, and that we had a responsibility to make a platform that wouldn't be owned by one of the existing research groups. We put this document together to give a high-level view of how things would work in this platform.
In this document we describe a reference architecture developed for Emulytics clusters at Sandia National Laboratories. Taking into consideration the constraints of our Emulytics software and the requirements for integration with the larger computing facilities at Sandia, we developed a cluster platform suitable for use by Sandia's several Emulytics toolsets and also useful for more general large-scale computing tasks.