In depth: Is this a $6m solution to a $209bn problem?
If you’ve never heard of DeepSeek before, you are not alone. The company was founded in 2023 by a hedge fund manager in Hangzhou, China. Before it revealed its new AI system a few weeks ago, and published an accompanying research paper that explained how it was done, only AI experts would have known it in the west.
But after its launch last week, the DeepSeek app quickly became the most popular free app in the US. And when the company revealed (£) what it said was the remarkably low cost of its system, it sparked a rapid rethink of where the future of AI might lie – with chaotic stock market consequences.
Here’s what you need to know.
Why is DeepSeek such a big deal?
Until now, the most successful AI models have needed vast amounts of computing power to train their chatbots: companies like ChatGPT (founded by Sam Altman, above) and Meta build their systems using as many as 16,000 of Nvidia’s chips – which are prized for their energy efficiency and ability to handle complex tasks, and sell for $30,000 to $40,000 each.
But DeepSeek says that it trained its base AI model using about 2,000 less advanced Nvidia chips, for about $6m, in less than two months. Citigroup estimates that (£) Microsoft, Meta, Amazon and Alphabet’s capital spending hit about $209bn last year, with 80% of that going on data centres.
DeepSeek-R1, the company’s “reasoning” model that can tackle difficult mathematical and scientific problems in areas that it doesn’t already know about, is said to perform the same complex tasks as OpenAI’s o1 model – at a price to business users that is 20 to 50 times cheaper.
We should exercise some caution about what DeepSeek says it can do, and there are some who claim that the story is too good to be true: on his X feed, Elon Musk agreed with Alexandr Wang, the CEO of AI firm Scale, who suggested that DeepSeek actually has about 50,000 Nvidia’s most advanced chips but cannot say so because of American export controls. But Wang did not provide evidence for the suggestion.
In another way, there are good reasons to think that the claims are credible: because its model is open source – unlike that powering OpenAI, despite the name – anyone can check its workings.
Altman, for his part, said on Monday night that Deepseek was “impressive, particularly around what they’re able to deliver for the price” and that OpenAI would accelerate the release of some upcoming products in response. He added: “We will obviously deliver much better models and also it’s legit invigorating to have a new competitor!”
How did they do it?
One of the key differences between DeepSeek and the better-known AI systems is its use of a technique called “mixture of experts”. Essentially, this means that instead of deploying its full computing force in every instance, it only activates the share that is relevant to the task at hand.
Morgan Brown, an AI staffer at Dropbox, likens this to “having a huge team but only calling in the experts you actually need for the task”, whereas traditional models have “one person be a doctor, lawyer, AND engineer”.
A model like OpenAI’s has 1.8 trillion parameters, or variables, which are active all the time; DeepSeek has 671 billion parameters, but only 37 billion active at once, Brown said. That has led to a view that while OpenAI is more powerful, DeepSeek is good enough for the average business user mindful of their bottom line.
Ironically enough, if it is true that DeepSeek engineers achieved what they have without Nvidia’s cutting-edge chips, their success appears to have been borne of necessity: the US has put such restrictive rules in place around the export of the most sophisticated Nvidia chips that the company was forced to innovate. Those rules were specifically created to prevent China catching up with the US AI industry.