Linpack is very close to the type of application GPUs were originally invented for, and with Nvidia being slowly pushed out of the discrete graphics processor business by Intel and AMD integrating increasingly better graphics processors directly onto the CPU die, Nvidia had to branch out and add supercomputer-like capabilities to their graphics cards in order to try to fetch market in the HPC space. They added double-precision, ECC, caching, and some other features. I wouldn't have thought the result would be a migration of the top supercomputer to China, but that is indeed what has happened.
Friday, November 19, 2010
China takes top supercomputer spot
It is now well known that China has taken the top supercomputer spot with a gigantic GPU cluster. It's not surprising that GPUs are able to power the Linpack benchmark to such great heights. Linpack benefits from a division of bigger tasks into smaller tasks known as "blocking". This is not an embarassingly parallel breakdown, which means it doesn't necessarily represent the type of performance that could be expected on data-parallel benchmarks, because the network bandwidth, latency, on-server bandwidth, latency, and on-chip bandwidth, latency are all put to the test. Where Nvidia can really deliver a high value is with their very high ~200 GBytes/sec of memory bandwidth per GPU whereas a commodity server node gets about 8-12 GBytes/sec. With such high bandwidth it is possible for the GPUs to approach their maximum SIMD capabilities, and for double-precision SIMD floating point operations they just scream relative to commodity processors.
Posted by Andrew Felch at 9:00 AM