Nvidia announces server ‘superchips,’ with and without GPUs
Nvidia’s Grace superchips cater to AI as well as legacy high-bandwidth applications not optimized for GPUs.
At its GPU technology conference (GTC) last year, Nvidia announced it would come out with its own server chip called Grace based on the Arm Neoverse v9 server architecture. At the time, details were scant, but this week Nvidia revealed the details, and they are remarkable.
With Grace, customers have two options, both dubbed superchips by Nvidia. The first is the Grace Hopper Superchip that was formally introduced last year, but only broadly described. It consists of a 72-core CPU, and a Hopper H100 GPU tightly connected by Nvidia’s new high-speed NVLink-C2C chip-to-chip interconnect, which has 900GB/s of transfer speed.
The second, announced this week, is the Grace CPU Superchip, which has no GPU. Instead, it has two 72-core CPUchips tied together via NVLink. Even without the H100 GPU, the Grace CPU Superchip has some pretty good benchmarks. Nvidia claims SPECrate2017_int_base performance of more than 1.5x higher compared to the dualhigh-end AMD Epyc “Rome” generation processors already shipping with Nvidia’s DGX A100 server.
The two superchips will serve two different markets, according to Paresh Kharya, senior director of product management and marketing at Nvidia. The Grace Hopper Superchip is intended to address the giant scale of AI and HPC, with focus on the bottleneck of CPU system memory, he said.
“Bandwidth is limited, and when you connect the CPU and GPU in a traditional server. the flow of data from the system memory to the GPU is bottlenecked by the PCIe slot,” he said. “So by putting the two chips together and interconnecting them with our NVLink interconnect, we can unblock that memory.”
Both the Grace CPU Superchip and Grace Hopper Superchip eschew standard DRAM memory sticks in favor of a new memory technology that Nvidia calls LPDDR5X. The memory is on the chip die and physically right next to the chips themselves, rather than on memory sticks in DIMM slots. This direct connection offers up to 1TB/s of bandwidth while supporting in-memory error correction. Kharya said that memory performance is up to 30 times faster than Nvidia’s current Ampere technology, which uses traditional DIMM memory.
With the Grace CPU Superchip, Nvidia has a different emphasis. First, it put both the CPUs as well as the LPDDR5X memory in a single package with a 500-watt power draw, which he says is twice as energy efficient as leading CPUs. It may be more than that. A dual socket x86 server will easily exceed 500 watts, and have nowhere near as many cores. And that doesn’t take into account the power draw of the memory.
The memory bandwidth of the Grace CPU Superchip will benefit a range of applications that are not yet accelerated for GPUs.
Another potential market for the Grace CPU Superchip is AI inference. Some inference tasks require a lot of pre- and post-processing that needs to happen on the CPU and some other parts of the application are processed on the GPU. He also cited data analytics as a big potential market since.
“There’s a long tail of applications that have not yet been accelerated on GPUs. Those would immediately benefit. They will really like the high-memory bandwidth to process faster as well as the speed of the CPU cores,” said Kharya.
Nvidia said Grace CPU Superchip and Grace Hopper Superchip should ship by the end of this year or the beginning of next year.