Descriptions:
Dr. Károly Zsolnai-Fehér of Two Minute Papers breaks down DeepSeek V4, a major new open-weights AI model documented in a 58-page research paper, explaining the technical innovations that let it compete with top commercial frontier models at dramatically lower cost. The core contribution is a three-layer compression system for the KV cache — the memory scratch pad that stores prompts and documents during inference. Token-level compression summarizes individual passages; Heavily Compressed Attention applies 128-to-1 structural compression to give the model a high-level overview; and Compressed Sparse Attention builds an index for targeted retrieval. Together, these reduce KV cache memory requirements by approximately 90% while largely preserving accuracy.
The DeepSeek V4 Pro model features a 1 million token context window in open weights — a capability that was considered a flagship differentiator for Google Gemini not long ago — and reportedly outperforms Gemini 3.1 Pro on long-context recall benchmarks. The new Pro model also requires about 3x less compute than its predecessor, while the lighter Flash variant needs 10x less. Pricing for API access is cited as 8 to 30 times cheaper than Anthropic’s Claude depending on available discounts.
The video is careful to flag real limitations: DeepSeek V4 is unimodal (text only, no image or audio input), accuracy degrades when pushing against the edges of the context window, and two training stabilization techniques used by the team are not yet fully understood even by its creators. Dr. Zsolnai-Fehér frames these honestly, making this a balanced technical overview rather than straight hype.
📺 Source: Two Minute Papers · Published May 06, 2026
🏷️ Format: Deep Dive







