Descriptions:
Two Minute Papers host Dr. Károly Zsolnai-Fehér argues that DeepSeek’s expanded 80-page technical report may represent the first complete, openly reproducible recipe for building ChatGPT-level AI — a direct contrast to OpenAI’s GPT-4 paper, which explicitly omitted architecture, hardware, compute, and training details. The video walks through five key technical findings from the DeepSeek R1 research that Zsolnai-Fehér considers genuine breakthroughs.
The most significant contributions include GRPO (Group Relative Policy Optimization), which replaces the expensive PPO teacher-student training setup by having the model generate 16 candidate answers and ranking them against each other — dramatically reducing training cost. The video also covers DeepSeek R1’s discovery of chain-of-thought reasoning through pure reinforcement learning: starting with zero human examples, the model climbed from roughly 15% to nearly 80% accuracy on competition mathematics by itself, spontaneously developing behaviors like pausing to reconsider answers. A fourth finding examines why a small number of initial examples (a “flashlight”) accelerates learning in natural language tasks — more than tripling AlpacaEval performance — while adding little to abstract math. The fifth and most impactful contribution is distillation: DeepSeek used R1 to generate 800,000 reasoning examples, allowing much smaller models to inherit its capabilities.
The video positions this release as a landmark for open-source AI development and scientific reproducibility.
📺 Source: Two Minute Papers · Published February 04, 2026
🏷️ Format: Deep Dive







