One problem I ran into on every PCI card that I worked with in grad school is that it takes a lot of work to transfer data from the host to the card efficiently. I wound up writing a library that implemented mutliple PIO data transfer tricks and added some mechanisms for it to select which injection method to use based on host performance.
A key task for providing high performance in cluster computers is efficiently transferring data between cluster resources. This study focuses on one component of the communication pipeline: the host to peripheral card interface. As Moore's Law continues to progress, we are seeing successive generations of clusters with increasing compute power and communications bandwidth, but with roughly the same I/O systems. Communication software is continuously being re-optimized for each succeeding generation of hardware.
In this paper we describe a tunable library for host-to-device communication. The library profiles performance characteristics of the host's hardware environment and utilizes this information to automatically configure host-to-device transfer mechanisms. In addition to taking advantage of CPU-specific features, the library exposes I/O characteristics of individual peripheral devices in data transfer optimizations. The benefit of the library is demonstrated by providing measurements and experiences with three generations of clusters.