A British chip startup has launched what it claims is the world’s most complex AI chip, the Colossus MK2 or GC200 IPU (intelligence processing unit). Graphcore is positioning its MK2 against Nvidia’s Ampere A100 GPU for AI applications.
The MK2 and its predecessor MK1 are designed specifically to handle very large machine-learning models. The MK2 processor has 1,472 independent processor cores and 8,832 separate parallel threads, all supported by 900MB of in-processor RAM.
Graphcore says the MK2 offers a 9.3-fold improvement in BERT-Large training performance over the MK1, a 8.5-fold improvement in BERT-3Layer inference performance, and a 7.4-fold improvement in EfficientNet-B3 training performance.
BERT, or Bidirectional Encoder Representations from Transformers, is a technique for natural language processing pre-training developed by Google for natural language-based searches.
And Graphcore isn’t stopping at just offering a chip. For a relatively new startup (it formed in 2016), Graphcore has built a remarkable ecosystem around its chips. Most chip startups focus on just their silicon, but Graphcore offers a lot more.
It sells the GC200 through its new IPU-Machine M2000, which contains four GC200 chips in a 1U box and delivers 1 petaflop of total compute power, according to the company. Graphcore notes you can get started with a single IPU-Machine M2000 box directly connected to an existing x86 server or add up to a total of eight IPU-Machine M2000s connected to one server. For larger systems, it offers the IPU-POD64, comprising 16 IPU-Machine M2000s built into a standard 19-inch rack.
Connecting IPU-Machine M2000s and IPU-PODs at scale is done through Graphcore’s new IPU-Fabric technology, which has been designed from the ground up for machine intelligence communication and delivers a dedicated low latency fabric that connects IPUs across the entire data center.
Graphcore’s Virtual-IPU software integrates with workload management and orchestration software to serve many different users for training and inference, and it allows the available resources to be adapted and reconfigured from job to job.
The startup says its new hardware is completely plug-and-play, and that customers will be able to connect up to 64,000 IPUs together for a total of 16 exaFLOPs of computing power.
That’s a BIG claim. Intel, Arm, AMD, Fujitsu, and Nvidia are still pushing toward one exaflop, and Graphcore is claiming 16 times that.
Another key element of Graphcore is its Poplar software stack designed from scratch with the IPU and fully integrated with standard machine learning frameworks, so developers can port existing models easily, and get up and running quickly in a familiar environment. For developers who want full control to exploit maximum performance from the IPU, Poplar enables direct IPU programming in Python and C++.
Graphcore has some significant early adopters of MK2 system, including the University of Oxford, the U.S. Department of Energy’s Lawrence Berkeley National Laboratory, and J.P. Morgan, which is focused on natural language processing and speech recognition.
IPU-Machine M2000 and IPU-POD64 systems are available to pre-order today with full production volume shipments starting in Q4 2020. Early access customers are able to evaluate IPU-POD systems in the cloud through Graphcore’s cloud partner Cirrascale. It plans to announce OEM and channel partner in the coming months.