Descriptions:
ARC-AGI 3 has officially launched as the first interactive version of the Abstraction and Reasoning Corpus benchmark, and Matthew Berman walks through what makes this release a meaningful step forward in measuring artificial general intelligence. Unlike coding, math, or science benchmarks where the best human experts compete against top AI systems, ARC-AGI tasks are trivially solvable by average humans but remain stubbornly difficult for frontier models โ the defining characteristic that makes the benchmark compelling.
Berman traces the progression from ARC-AGI 1 (now nearly saturated, with top models approaching 93โ94%) to ARC-AGI 2, where even the best current systems fall well short: GPT 5.4 Pro Extra High leads at 72% with a cost per task of $39, followed by Gemini 3.1 Pro at 69% and Claude Opus 4.6 medium at 68%, while humans still achieve 100%. ARC Prize maintains a $2 million prize for full saturation.
The third iteration is a major format departure. Instead of pattern-completion puzzles, ARC-AGI 3 drops both humans and AI agents into an undescribed video game environment with zero instructions and a limited turn budget. Berman demonstrates live gameplay, showing how the challenge requires genuine exploration and generalization rather than pattern memorization. The benchmark is designed to resist the memorization strategies that allowed AI systems to climb earlier leaderboards, making it the most robust test of open-ended reasoning published to date.
๐บ Source: Matthew Berman ยท Published March 27, 2026
๐ท๏ธ Format: News Analysis







