You’ve likely heard of DeepSeek: The Chinese company released a pair of open large language models( LLMs), DeepSeek-V3 and DeepSeek-R1, in December 2024, making them available to anyone for free use and modification. Then, in January, the company released a free chatbot app, which quickly gained popularity and rose to the top spot in Apple’s app store. The DeepSeek models’ excellent performance, which rivals the best closed LLMs from OpenAI and Anthropic, spurred a stock market route on 27 January that wiped off more than US $600 billion from leading AI stocks.
Proponents of open AI models, however, have met DeepSeek’s releases with enthusiasm. Over 700 models based on DeepSeek-V3 and R1 are now available on the AI community platform HuggingFace. Collectively, they’ve received over five million downloads.
Cameron R. Wolfe, a senior research scientist at Netflix, says the enthusiasm is warranted. “DeepSeek-V3 and R1 legitimately come close to matching closed models. Plus, the fact that DeepSeek was able to make such a model under strict hardware limitations due to American export controls on Nvidia chips is impressive.”
DeepSeek-V3 cost less than $6M to train
It’s that second point—hardware limitations due to U.S. export restrictions in 2022—that highlights DeepSeek’s most surprising claims. The company says the DeepSeek-V3 model cost roughly $5.6 million to train using Nvidia’s H800 chips. The H800 is a less performant version of Nvidia hardware that was designed to pass the standards set by the U.S. export ban. A ban meant to stop Chinese companies from training top-tier LLMs. (The H800 chip was also banned in October 2023.)
DeepSeek achieved impressive results on less capable hardware with a “DualPipe” parallelism algorithm designed to get around the Nvidia H800’s limitations. It uses low-level programming to precisely control how training tasks are scheduled and batched. The model also uses a “mixture-of-experts” (MoE) architecture which includes many neural networks, the “experts,” which can be activated independently. Because each expert is smaller and more specialized, less memory is required to train the model, and compute costs are lower once the model is deployed.
The result is DeepSeek-V3, a large language model with 671 billion parameters. While OpenAI doesn’t disclose the parameters in its cutting-edge models, they’re speculated to exceed one trillion. Despite that, DeepSeek V3 achieved benchmark scores that matched or beat OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet.
And DeepSeek-V3 isn’t the company’s only star; it also released a reasoning model, DeepSeek-R1, with chain-of-thought reasoning like OpenAI’s o1. While R1 isn’t the first open reasoning model, it’s more capable than prior ones, such as Alibiba’s QwQ. As with DeepSeek-V3, it achieved its results with an unconventional approach.
Most LLMs are trained with a process that includes supervised fine-tuning…
Read full article: DeepSeek Revolutionizes AI with Open Large Language Models
The post “DeepSeek Revolutionizes AI with Open Large Language Models” by Matthew S. Smith was published on 01/31/2025 by spectrum.ieee.org
Leave a Reply