Nvidia Blackwell Reigns Supreme in MLPerf Training Benchmark

For those who enjoy rooting for the underdog, the latest MLPerf benchmark results will disappoint: Nvidia’s GPUs have dominated the competition yetagain. This includes chart-topping performance on the latest and most demanding benchmark, pretraining the Llama 3.1 403B large language model. That said, the computers built around the newest AMD GPU, MI325X, matched the performance of Nvidia’s H200, Blackwell’s predecessor, on the most popular LLM fine-tuning benchmark. This suggests that AMD is one generation behind Nvidia.

MLPerf training is one of the machine learning competitions run by the MLCommons consortium. “AI performance sometimes can be sort of the Wild West. MLPerf seeks to bring order to that chaos,” says Dave Salvator, director of accelerated computing products at Nvidia. “This is not an easy task.”

The competition consists of six benchmarks, each probing a different industry-relevant machine learning task. The benchmarks are content recommendation, large language model pretraining, large language model fine-tuning, object detection for machine vision applications, image generation, and graph node classification for applications such as fraud detection and drug discovery.

The large language model pretraining task is the most resource intensive, and this round it was updated to be even more so. The term “pretraining” is somewhat misleading—it might give the impression that it’s followed by a phase called “training.” It’s not. Pretraining is where most of the number crunching happens, and what follows is usually fine-tuning, which refines the model for specific tasks.

In previous iterations, the pretraining was done on the GPT3 model. This iteration, it was replaced by Meta’s Llama 3.1 403B, which is more than twice the size of GPT3 and uses a four times larger context window. The context window is how much input text the model can process at once. This larger benchmark represents the industry trend for ever larger models, as well as including some architectural updates.

Blackwell Tops the Charts, AMD on Its Tail

For all six benchmarks, the fastest training time was on Nvidia’s Blackwell GPUs. Nvidia itself submitted to every benchmark (other companies also submitted using various computers built around Nvidia GPUs). Nvidia’s Salvator emphasized that this is the first deployment of Blackwell GPUs at scale and that this performance is only likely to improve. “We’re still fairly early in the Blackwell development life cycle,” he says.

This is the first time AMD has submitted to the training benchmark, although in previous years other companies have submitted using computers that included AMD GPUs. In the most popular benchmark, LLM fine-tuning, AMD demonstrated that its latest Instinct MI325X GPU performed on par with Nvidia’s H200s. Additionally, the Instinct MI325X showed a 30 percent improvement over its predecessor, the Instinct MI300X. (The main difference between the two is that…

Read full article: Nvidia Blackwell Reigns Supreme in MLPerf Training Benchmark

The post “Nvidia Blackwell Reigns Supreme in MLPerf Training Benchmark” by Dina Genkina was published on 06/04/2025 by spectrum.ieee.org