Get the latest crypto news, updates, and reports by subscribing to our free newsletter.
Giấy phép số 4978/GP-TTĐT do Sở Thông tin và Truyền thông Hà Nội cấp ngày 14 tháng 10 năm 2019 / Giấy phép SĐ, BS GP ICP số 2107/GP-TTĐT do Sở TTTT Hà Nội cấp ngày 13/7/2022.
© 2026 Index.vn
During MLPerf Inference v6.0, NVIDIA was the only manufacturer to submit results for DeepSeek-R1, recording a nine-fold lead over the nearest competitor.
MLPerf Inference v6.0, developed by MLCommons, adds support for advanced inference and mixture-of-experts (MoE) models, including DeepSeek-R1, GPT-OSS-120B, and Mixtral 8x7B. The benchmark suite also broadens to dense large language models, generative-recommendation systems, and vision-language models, reflecting enterprise use cases. CEO Jensen Huang has described MLPerf as one of the most stringent benchmarks available.
The most notable comparisons come from the GB300 NVL72 configuration when looking at results from v5.1 versus v6.0.
For the DeepSeek-R1 task in Server mode, throughput increased from 2,907 to 8,064 tokens per second per GPU, a 2.77x improvement.
In Offline mode, throughput rose from 5,842 to 9,821 tokens per second per GPU, a 1.68x increase.
For the Llama 3.1 405B model, Server speed increased from 170 to 259 tokens per second per GPU (1.52x). Offline performance reached 271 tokens per second per GPU versus 224 tokens per second per GPU in the previous generation (1.21x).
NVIDIA said the majority of the gains come from software optimizations rather than hardware changes. Since the first DeepSeek-R1 participation a few months earlier, NVIDIA improved token throughput by 2.7x through software updates alone.
On the hardware side, the GB300 NVL72 configuration delivers speeds up to 2.77x faster than GB200 NVL72, reflecting year-on-year improvements.
NVIDIA noted that it was the only vendor to submit DeepSeek-R1 results in last year’s MLPerf Inference. In v6.0, the company said this advantage remains, pointing to limited participation from other chip makers and even AMD compared with NVIDIA’s approach.
NVIDIA attributed its inference performance to what it described as an extremely tight co-design across the chip, system architecture, data-center design, and software. The company also said the MLPerf Inference v6.0 results are used to demonstrate token/USD and total cost of ownership (TCO) competitiveness in large-scale deployments.

In brief\n\nBitcoin dropped to about $93,000, falling back below the EMA50 and putting its recent golden cross at risk of invalidation. The global crypto market cap stands at $3.15 trillion, down 2.38% in 24 hours. On Myriad Markets, 82% of the money is betting on Bitcoin pumping to $100K before…