DeepSeek and the Effects of GPU Export Controls
Last week, DeepSeek unveiled their V3 model, trained on just 2,048 H800 GPUs - a fraction of the hardware used by OpenAI or Meta. DeepSeek claims their model matches or exceeds several benchmarks set by GPT-4 and Claude
What's interesting isn't just the results, but how they got there.
The Numbers Game
Let's look at the raw figures:
Training cost: $5.5M (vs $40M for GPT-4)
GPU count: 2,048 H800s (vs estimated 20,000+ H100s for major labs)
Parameters: 671B
Training: 2.788M GPU hours
Recent resear...
Read more at vincentschmalbach.com