The Pragmatic Engineer

The Pragmatic Engineer

Share this post

The Pragmatic Engineer
The Pragmatic Engineer
The Pulse #122: DeepSeek rocks the tech industry
The Pulse

The Pulse #122: DeepSeek rocks the tech industry

Almost unknown Chinese lab releases AI model that’s open, free, and as good as ChatGPT’s best models. Oh, and it’s also cheaper to operate. This has sent shockwaves through the AI sector

Gergely Orosz's avatar
Gergely Orosz
Jan 30, 2025
∙ Paid
190

Share this post

The Pragmatic Engineer
The Pragmatic Engineer
The Pulse #122: DeepSeek rocks the tech industry
12
8
Share

The Pulse is a series covering insights, patterns, and trends within Big Tech and startups. Notice an interesting event or trend? Send me a message.

This week, a massive event shook the tech industry: a lesser-known Chinese AI lab shocked the markets and tech professionals with the DeepSeek AI model, which feels on a par with OpenAI’s most capable publicly available model, ChatGPT o1. OpenAI has a more advanced o3 model, but it’s in preview and isn’t publicly available yet. DeepSeek is released as open and free to use within the DeepSeek app, or for anyone to host and download it.

Major AI companies are coming to terms with the fact that a small team in China with supposedly little funding, and no access to NVIDIA’s latest AI chips, could pull this feat off. It shatters the image of OpenAI’s invincibility, the notion that the US leads the AI race, and also raises the question of whether open models will turn advanced LLMs into a commodity.

Today, we cover:

  1. The first “thinking model” that feels fast – and is a hit

  2. About 4x cheaper — and possibly more efficient? — than ChatGPT

  3. Open model spreads fast

  4. OpenAI’s need to remain fully closed highlighted by DeepSeek

  5. How did DeepSeek do it, and why give it away for free?

  6. Geopolitics and export controls

  7. Google feared open source AI will win

1. The first “thinking model” that feels fast – and is a hit

On Monday, NVIDIA’s valuation plummeted from $3.5 trillion to $2.9 trillion; an almost $600B reduction in its market cap on a 17% drop in the stock price. This was reported as the biggest ever fall by a U.S. company. The cause? A new Large Language Model (LLM) model called DeepSeek built by a Chinese AI startup, which has been an overnight sensation. Also on the same day, the DeepSeek app (built by the same company) hit the #1 spot on the US App Store on both iOS and Android, making it more downloaded than ChatGPT, which was relegated to #2. DeepSeek has remained #1 since.

Top iOS apps in the US. It’s rare for an app by a Chinese developer to hit top spot. DeepSeek is #1 on Android as well

What’s the cause of Deepseek’s sudden popularity? It’s thanks to the company updating the app to enable its “DeepThink (R1)” mode that uses their DeepSeek-R1 model. This model is similar to OpenAI’s o1 model in that it takes more ‘thinking time’ to respond, by using more compute to serve up a better response.

A big difference is that DeepSeek displays the model’s “chain of thought”, whereas OpenAI hides what happens during the “thinking” phase. So, the model feels much more “snappy” than OpenAI’s o1, more transparent, and more relatable. And frankly, it’s a far better experience to watch the model “think out loud” for 30 seconds, than watching ChatGPT’s spinner for 30 seconds.

Here’s a good example of what happens when asking a question that trips a lot of LLMs up: “if a chicken says ‘all chickens are liars’ is the chicken telling the truth?” DeepSeek starts to “think” for nearly a minute, spitting out pages-worth of internal monologue:

DeepSeek shows its inner prompts while “thinking” up a response. It generated four times as much to answer the riddle I posed

In the end, the answer it generates concludes the question is a paradox. The output is pretty similar to what OpenAI’s o1 produces, except o1 takes around the same time (38 seconds) to “think” and doesn’t show anything to the user.

DeepSeek: free, OpenAI: $20-200/month. An obvious reason for the DeepSeek app’s popularity is that it’s free and offers virtually the same functionality as paid ChatGPT plans, which cost $20/month for limited access, and $200/month for unlimited access to the advanced o1 and o1-mini models. DeepSeek offers all of this for free, while somehow dealing with what look like enormous loads. The key to this is that DeepSeek seems to be an order of magnitude cheaper to operate than existing models, like OpenAI’s’.

2. About 4x cheaper — and possibly more efficient? — than ChatGPT

The team behind DeepSeek found dozens of approaches to improve efficiency of their model – and published these optimizations in a paper titled DeepSeek-V3 Technical Report. Novel optimization methods include:

  • Multi-Head Latent Attention (MLA.) A novel attention mechanism that enhances inference efficiency, which is how quickly and resourcefully the AI model processes and generates outputs after being trained, by reducing memory overhead with clever compression.

  • DeepSeek Mixture-of-Experts (MoE). The DeepSeek-V3 model uses 671B parameters, but only 37B are activated for each token: the most relevant parts of the model. This makes computation a lot more efficient. While MoE has been around for several years, DeepSeek improved this architecture with DeepSeekMoE, an approach to use less computation for similar results.

  • Reduced KV cache usage. We previously covered how ChatGPT uses KV Cache as a workaround to deal with self-attention scaling quadratically. The MLA technique allows for reduced KV Cache usage.

These and other optimizations result in cheaper training and operational costs. DeepSeek offers its model via an API, similar to ChatGTP. At present, DeepSeek’s V3 model is 4-5x cheaper than ChatGPT’s GPT4 model. Assuming both ChatGPT and DeepSeek are priced to offer their API at cost – or at a slight profit – this suggests a similar efficiency gain for DeepSeek.

Update on 30 Jan: an earlier version of this post incorrectly stated a 10-40x efficiency gain. Thank you to Emmet for flagging this.

3. Open model spreads fast

Another big difference between DeepSeek and most other LLM models is that DeepSeek is open and free to use. The model has been released with open weights, and its license allows full freedom to use or modify it. Modifying complex models makes little sense, but this freedom of usage means any company can host its own DeepSeek models.

This approach is similar to Mistral’s with Mixtral (a permissive Apache 2.0 license), and Meta’s with Llama, although Llama comes with commercial usage limitations, meaning it falls short of the definition of open source. DeepSeek models are released under the permissive MIT license, which allows unrestricted commercial use. And commercial use is already booming. In just a few days, these companies have deployed DeepSeek for commercial usage:

  • Perplexity: Pro users get 500 R1 searches per day, free ones can do 5. In comparison, Pro users can do a mere 10 ChatGPT o1 searches – all coming down to cost! As a paid subscriber to The Pragmatic Engineer, you can get 12 months access to Perplexity Pro.

  • Microsoft Azure: made DeepSeek available on its Azure AI Foundry service

  • Amazon AWS: DeepSeek available to use via Bedrock

  • Meta: testing Deepseek internally, reported by The Information

  • IBM: added support to DeepSeek on WatsonX AI (their enterprise-grade AI studio)

  • Databricks: offers this model, as per The Information

The usage will surely spread much wider in future. Who would not want to use a model that is several times cheaper to operate than ones from OpenAI and Anthropic, while offering similar capabilities?

4. OpenAI’s need to remain fully closed highlighted by DeepSeek

The company impacted the most negatively by DeepSeek is OpenAI, the leader in AI models and user mindshare, until now.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Gergely Orosz
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share