A few months ago I bought my son a Creality LD-002H Resin 3D Printer so we could play around with printing some simple objects. Like most people, we watched a lot of videos to get a better handle on how to (safely) work with resin, printed a bunch of example objects from Thingiverse, and then watched more videos to figure out how to make better prints. While printing models has been fun, I'd really like the kids to get a better handle on how to create their own objects through photogrammetry tools like Meshroom or editors like Blender.
The last few weeks I've been working through the details of how I could convert and CT scan of me into something I could print. Through a combination of open-source tools I was able to generate a mesh model of my pelvis and print a small version of it. I seem to forget how to use all these tools after a few days, so this post is just some notes for me to be able to recreate the process in the future.
Viewing CT Data in 3D Slicer
Back in 2013 I had to go to the hospital because I had a bad infection that needed surgery. The doctors ran me through a CT machine to get a better view of what was happening inside me. After my surgery I learned that Kaiser will burn a CD with your data on it for only $20 if you ask them. Being curious, I ordered a copy and poked around with it for a bit. While the data isn't in a format that I recognized, a viz friend pointed me at a tool called 3D Slicer (based on VTK) that's designed to look at medical data. It's a little confusing to get started, but you basically:
I always stumble around with the default view settings. Usually the problem is that I haven't loaded a DICOM entry with anything in it (eg, a patient record) or I forgot to tell Slicer to view the data by turning the volume's closed eye into an open eye. Once the 3D data shows up, I switch to a 3D-only view by selecting View and Layout.
Extracting and Smoothing Contours
While the volume rendering tool is a nice way to poke through the data, what you really need to do is extract the contour for the bone so you can extract it as a mesh (ie, isosurfacing). To do this you need to run the Segmentation tool to build a mesh and then smooth it to make it more printable.
It took some trial and error to find a threshold value that produced a good view of my pelvis (too low and you get tissue, too high and the bone starts disappearing). The two main issues I saw with segmentation were that some regions had small holes and some curving regions were bumpy like corduroy. Selecting the Smoothing option from the Segmentation options helped fix some of these problems. I'm sure you could do a lot more here to fix things- some of the videos I watched showed how to get precise segmentation by hand labeling points. I'm not printing an actual hip replacement so I didn't go into to much detail. Once I was done I clicked on the Segmentations button and exported to STL.
Cleanup in Meshlab
I pulled the STL file into MeshLab to verify it looked ok in another meshing tool and do some additional cleanup. While I think Slicer can do this cleanup, MeshLab seemed like an easier way to make some hand edits. I looked around in the mesh and manually removed some leftover polygons.
Adding Supports in Chitubox
The final step for me was to load the model into Chitubox to turn it into printable object. Chitubox is pretty amazing- it'll analyze an object and figure out what supports would be added to make it printable on a 3D printer. My neighbor does a lot of 3D printing and gave me a lot of tips on making good prints with Chitubox. eg, Make the back side of the model face the build plate so the supports don't leave bumps in important places, angle the structure to make it easier to build, use the skate support platform shape to make it easier to remove, etc.
I manually placed the pelvis model and rotated it slightly, but pretty much used the default settings everywhere else. Even with a ton of supports, Chitubox still wasn't too happy with the printing risk. Once it was done I adjusted the print's first layer exposure time to 60s and the remaining layers to 6s. Based on previous prints, if you don't expose the first layer for long enough it doesn't stick to the build plate and your print fails.
Printing, Cleaning, and Curing
The final step was to take the model to the printer and print it out. Resin printing is a pretty messy and dangerous process. You align the print head (crucial!), pour resin into the vat, load the vat into the printer, and then print the model. This print took about four hours, but I didn't have to babysit it after I verified the first few layers had stuck. When it finished, I chiseled the part off the build plate and dropped it in a pickle jar (w/ strainer) filled with isopropyl to clean off the uncured resin. After a lot of shaking, I took it out and clipped off all the supports (which is unusually satisfying). From there I did another quick rinse in isopropyl, patted it down with paper towels, and put it under a UV light to cure it.
I'm still new to 3D printing but I think it looks pretty decent. The main problem with this print is that I scratched it up quite a bit when I was cleaning it up with paper towels (the print is still pretty soft at this point). In the future I'll probably buy a curing station which simplifies a lot of the cleaning and curing problems. The other problem with the print is that there are some extra holes in the back because I set the contouring threshold value too high. At least that's what I hope- maybe I just have really thin bones.
Safety and Cleanup
I should point out that the resin I'm using is toxic when uncured and it's important to follow safety procedures when doing this kind of printing. I wear disposable nitrile gloves and safety classes whenever I work with uncured resin, and crack the garage door to help vent the area. When it comes time to handling wet prints my neighbor suggested a practice of having one clean hand and one dirty hand, since there's always something you need to grab and you want to minimize what gets dirty. I use an excessive amount of isopropyl to clean up the build plate, vat, and tools when I'm done. Fortunately, you can just leave all the dirty towels and gloves in the sun for 30mins to cure them and make them safe for disposal.
Recently I've been working with Carlos Maltzahn at the University of California, Santa Cruz (UCSC) on a new DOE ASCR project that is exploring how programmable network interface cards (or SmartNICs) can be used to improve data management services in HPC/HPDA platforms. To get a better idea of what current hardware can do, Jianshen Liu conducted a number of network and CPU experiments on NVIDIA BlueField-2 SmartNICs located in NSF's CloudLab. While the ARMs on these cards are slow compared to modern host processors, they are sufficient for performing the kinds of operations we routinely need in the I/O space. Jianshen summarized what we learned in the following arXiv paper (UUR SAND2021-5854R).
High-performance computing (HPC) researchers have long envisioned scenarios where application workflows could be improved through the use of programmable processing elements embedded in the network fabric. Recently, vendors have introduced programmable Smart Network Interface Cards (SmartNICs) that enable computations to be offloaded to the edge of the network. There is great interest in both the HPC and high-performance data analytics communities in understanding the roles these devices may play in the data paths of upcoming systems.
This paper focuses on characterizing both the networking and computing aspects of NVIDIA's new BlueField-2 SmartNIC when used in an Ethernet environment. For the networking evaluation we conducted multiple transfer experiments between processors located at the host, the SmartNIC, and a remote host. These tests illuminate how much processing headroom is available on the SmartNIC during transfers. For the computing evaluation we used the stress-ng benchmark to compare the BlueField-2 to other servers and place realistic bounds on the types of offload operations that are appropriate for the hardware.
Our findings from this work indicate that while the BlueField-2 provides a flexible means of processing data at the network's edge, great care must be taken to not overwhelm the hardware. While the host can easily saturate the network link, the SmartNIC's embedded processors may not have enough computing resources to sustain more than half the expected bandwidth when using kernel-space packet processing. From a computational perspective, encryption operations, memory operations under contention, and on-card IPC operations on the SmartNIC perform significantly better than the general-purpose servers used for comparisons in our experiments. Therefore, applications that mainly focus on these operations may be good candidates for offloading to the SmartNIC.
There was a lot of talk on local social media this week about a large military plane that flew over Livermore at a very low altitude. The town is already wound up about larger planes flying into Livermore because there's an expansion plan being discussed that would allow private charter 737s to land at the airport. Seeing a massive, loud military plane flying low over town made everyone wonder if that's what daily life is going to be like in the next few years.
There was a lot of speculation about what was going on with this flight. Some people thought it was emergency vaccine supplies being dropped off. Others claimed it was a military salute for a Lt. Col. in Pleasanton who was just awarded the Distinguished Flying Cross. In the end it turned out to be a C-17 Globemaster doing a practice landing approach at our municipal airport (see Brodie Brazil's video of the approach).
I went to my PiAware node and pulled the day's data to see what military flights took place on Wednesday. I found AE07E0 was active around the 4pm time period people were talking about. Interestingly, the flight used the call sign SOUND87, which made me wonder if this was some kind of sound test for the airport extension (I doubt it now- it seems like a routine test). As the below plots show, the plane flew in from the east, passed the airport, and made a sharp turn north. Zooming in on the east side of town, I noticed it flew right over LLNL at about 2K ft, just a block away from their $3.5B national ignition facility (NIF). I'd thought they had a no-fly zone over them, but that seems to only be for drones. Here is the raw data for the flight.
Debunking the Salute Theory
The idea that the military would dive bomb a city to show its appreciation of a soldier bothered me, so I went to FlightRadar24 and pulled up the data for the whole flight. As seen below, they took off from Vegas, circled the bay area, and then dropped in on Livermore. After that, they flew north to Concord and did a similar practice approach at Concord's municipal airport (CCR) before landing at Travis AFB. Given that they didn't fly anywhere near Pleasanton and they made a second drop somewhere else, I'd guess this has nothing to do with the Lt. Col's award.
Questioning the Airport Expansion
One thing this flight really highlights is that Livermore people will notice larger planes flying to our airport. While the C-17 is much bigger and louder than the 737s the expansion is targeting, it made a lot of people realize that the airport approach really does stretch all the way across town, starting at the $3.5B big science experiment at the lab. I hope enough people stand up to the FAA and prevent larger planes from being able to land there.
Last Fall we purchased some NVIDIA Ampere A100 GPU cards to get a better understanding of how much they might impact some of our data-intensive workloads. Stefan Seritan dug into the details and put together this performance evaluation report.
The performance of NVIDIA's latest A100 graphics processing unit (GPU) is benchmarked for computing and data analytic workloads relevant to Sandia's missions. The A100 is compared to previous generations of GPUs, including the V100 and K80, as well as multi-core CPUs from two generations of AMD's EPYC processors, Zen and Zen 2. Computing workloads such as sparse matrix operations (e.g. HPCG benchmark) and numerical solver-heavy applications based on Trilinos and Kokkos see a moderate 1.5x to 2x speedups compared to the V100, consistent with the increased core count and memory bandwidth of the A100. Training and inference on machine learning (ML) models such as ResNet-50 for image classification and BERT-Large for natural language processing show the same 2x speedup over the V100.
However, these ML workloads also benefit from increased tensor core capabilities in the V100 and A100 GPUs, yielding a 3.5x speedup using a mixed (single + half) precision strategy for floating point operations. While the performance gap between GPUs and CPUs remains moderate (3x to 8x) for high-performance computing applications, these new hardware features of recent GPU generations give 50x to 100x speedups in out-of-the-box ML workloads compared to CPUs. With additional A100 features still undergoing testing (INT8, structural sparsity, multi-instance GPUs) with clear applications for ML workloads, the A100 GPU seems an extremely promising hardware accelerator for artificial intelligence (AI) and data analytics research at Sandia.
Joe Kenny has worked hard over the last few years to understand the tradeoffs of building a 100GigE network fabric for cluster computers that can run regular TCP-based applications and HPC applications that use MPI and RDMA. He put a paper together for INDIS at SC this year that ties together some of simulation work he did with Jeremy Wilke and some of the practical experiments we've been doing on real hardware.
Priority-based Flow Control (PFC), RDMA over Converged Ethernet (RoCE) and Enhanced Transmission Selection (ETS) are three enhancements to Ethernet networks which allow increased performance and may make Ethernet attractive for systems supporting a diverse scientific workload. We constructed a 96-node testbed cluster with a 100 Gb/s Ethernet network configured as a tapered fat tree. Tests representing important network operating conditions were completed and we provide an analysis of these performance results. RoCE running over a PFC-enabled network was found to significantly increase performance for both bandwidth-sensitive and latency-sensitive applications when compared to TCP. Additionally, a case study of interfering applications showed that ETS can prevent starvation of network traffic for latency-sensitive applications running on congested networks. We did not encounter any notable performance limitations for our Ethernet testbed, but we found that practical disadvantages still tip the balance towards traditional HPC networks unless a system design is driven by additional external requirements.