AI Training:Newest Google and Nvidia Chips Speed AI Training

Nvidia, Oracle, Google, Dell and 13 other companies reported how long it takes their computers to train the key neural networks in use today. Among those results were the first glimpse of Nvidia’s next generation GPU, the B200, and Google’s upcoming accelerator, called Trillium. The B200 posted a doubling of performance on some tests versus today’s workhorse Nvidia chip, the H100. And Trillium delivered nearly a four-fold boost over the chip Google tested in 2023.

The benchmark tests, called MLPerf v4.1, consist of six tasks: recommendation, the pre-training of the large language models (LLM) GPT-3 and BERT-large, the fine tuning of the Llama 2 70B large language model, object detection, graph node classification, and image generation.

Training GPT-3 is such a mammoth task that it’d be impractical to do the whole thing just to deliver a benchmark. Instead, the test is to train it to a point that experts have determined means it is likely to reach the goal if you kept going. For Llama 2 70B, the goal is not to train the LLM from scratch, but to take an already trained model and fine-tune it so it’s specialized in a particular expertise—in this case,government documents. Graph node classification is a type of machine learning used in fraud detection and drug discovery.

As what’s important in AI has evolved, mostly toward using generative AI, the set of tests has changed. This latest version of MLPerf marks a complete changeover in what’s being tested since the benchmark effort began. “At this point all of the original benchmarks have been phased out,” says David Kanter, who leads the benchmark effort at MLCommons. In the previous round it was taking mere seconds to perform some of the benchmarks.

Performance of the best machine learning systems on various benchmarks has outpaced what would be expected if gains were solely from Moore’s Law [blue line]. Solid line represent current benchmarks. Dashed lines represent benchmarks that have now been retired, because they are no longer industrially relevant.MLCommons

According to MLPerf’s calculations, AI training on the new suite of benchmarks is improving at about twice the rate one would expect from Moore’s Law. As the years have gone on, results have plateaued more quickly than they did at the start of MLPerf’s reign. Kanter attributes this mostly to the fact that companies have figured out how to do the benchmark tests on very large systems. Over time, Nvidia, Google, and others have developed software and network technology that allows for near linear scaling—doubling the processors cuts training time roughly in half.

First Nvidia Blackwell training results

This round marked the first training tests for Nvidia’s next GPU architecture, called Blackwell. For the GPT-3 training and LLM fine-tuning, the Blackwell (B200) roughly doubled the performance of the H100 on a per-GPU basis. The gains were a little less robust but still substantial for recommender…

Read full article: AI Training:Newest Google and Nvidia Chips Speed AI Training

The post “AI Training:Newest Google and Nvidia Chips Speed AI Training” by Samuel K. Moore was published on 11/13/2024 by spectrum.ieee.org