Two posts in one day. It’s two different subjects so I thought I should separate them. A while back I posted about HPC and Value (http://coreliuminc.com/2010/11/hpc-and-value/). There’s a link partway down where NCSA Director Thom Dunning talks about GPU being the future of supercomputing (http://insidehpc.com/2010/11/02/ncsa-director-gpu-is-the-future-of-supercomputing/). In the last few days I’ve been looking over the toolkit overview of Nvidia’s Cuda 4.0:
Quoting a quote from InsideHPC from Director Dunning: “Programming these machines to do [GPU] calculations is still a very substantial effort. There will be some applications that will be rewritten to use GPUs [but] a lot of times it will be only part of an application that will use it so you won’t get nearly the power and computing advantage of running it all on the GPU,” he said.
Read through that toolkit overview and read that quote again. I think NVidia just cracked it wide open. The CUDA SDK just got all the parts it needs to be a mainstream, plug-in component of regular software development. C++ has become the language of application parallelism (that should get the language lawyers foaming at the mouth but notice I said “application parallelism” and not “HPC”). Intel TBB is C++, Microsoft’s Parallel Patterns Library is C++ and now CUDA has pretty much full C/C++ support, with virtual functions (awesome, thank you NVidia devs), no-copy memory pinning and a unified memory address space from CPU to GPU. And single-thread access to any or all the GPUs (which means from anywhere in your code you can fire up GPU calcs across as many GPUs as you have).
I’m going to be on this as soon as I can get my hands on a 4.0 release candidate. Our APIs were designed with GPU in mind because in large-scale simulation there’s often a whole lot of small, simple nonlinear calculations repeated by the hundreds of thousands that are ideal for handing off to a GPU (small equations, hydrocarbon mixture thermodynamics, etc etc). Our API design is based on not having to care where client code is running or what’s behind an interface, and the 4.0 CUDA SDK means that most of our glue code to pin objects and data selectively to the GPU can now be thrown away. There’s some significant redesign needed to capitalise on 4.0′s features (virtual functions and unified address space mostly) but being able to offer a GPU option just got a lot easier.