Descriptions:
DeepSeek AI has published research on Engram, a memory architecture that could address one of the most wasteful inefficiencies in modern large language models. Dr. Károly Zsolnai-Fehér of Two Minute Papers explains the core problem: standard transformer models reconstruct all information from scratch for every query — even simple factual lookups — because they lack a lightweight mechanism for direct retrieval. Engram addresses this by replacing 20-25% of the compute-heavy mixture-of-experts (MoE) layers with n-gram embeddings combined with multi-head hashing, essentially giving the model an indexed lookup table for frequently needed facts.
The benchmark results are unusually clean. Rather than the typical research trade-off — better on some metrics, worse on others — Engram improves performance across every single benchmark tested compared to the standard MoE baseline. A targeted ablation test provides mechanistic insight into why: disabling the Engram memory module caused a 70% drop in trivia recall while reading comprehension held at 93%, suggesting the model has cleanly divided fact storage from reasoning. The architecture also includes a context-aware gating mechanism that cross-checks retrieved memories against the current context and discards irrelevant entries before they influence the output.
Beyond raw performance numbers, the implications for on-device AI are significant. If factual retrieval can be handled by a simple, efficient lookup structure rather than expensive neural computation, future models could be substantially smaller and cheaper to run — pointing toward capable AI that operates locally without cloud subscriptions.
📺 Source: Two Minute Papers · Published March 24, 2026
🏷️ Format: Deep Dive






