the SCARIEST chart in AI

Foundation Models3 months ago

the SCARIEST chart in AI

Descriptions:

Wes Roth breaks down what he calls “the scariest chart in AI development history” — a METR (Meter Research) benchmark tracking AI agents’ ability to complete tasks measured in human expert hours. METR, a nonprofit evaluating frontier AI risks, assembled hundreds of tasks across engineering, coding, machine learning, and cybersecurity, using human completion time as the y-axis rather than AI runtime. The result is a capability measure grounded in real labor displacement.

The data shows a sharp acceleration: Claude Opus 4.5 reached approximately the 5-hour mark at 50% success rate, and the recently released Claude Opus 4.6 jumped to 14.5 hours — nearly two full workdays of expert output per session. The doubling pace of AI task capability has also compressed, from every 7 months historically to roughly every 4 months from 2023 onward, suggesting the trend line itself is bending upward.

Roth pairs the benchmark analysis with personal experience: Opus 4.6 completed a complex multi-year accounting reconciliation — work he estimates would cost hundreds of dollars with a CPA — in about 30-40 minutes while he played a video game. He also built and deployed a fully functional AI news aggregator (natural20.com) overnight without supervision, a project he estimates would take a human specialist one to two days. His key observation: many of these 14-hour-equivalent tasks aren’t one-off jobs. When an AI agent completes a complex workflow, it frequently automates that workflow permanently, multiplying real-world leverage well beyond what any benchmark captures.

📺 Source: Wes Roth · Published February 24, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

Wes Roth

Tags

Anthropic Claude Code Claude Opus 4.6 Dario Amodei Elon Musk METR OpenAI Sam Altman

Prev

Tariff Uncertainty, AI Unease Rattle Tech Shares | Bloomberg Tech 2/23/2026

Tariff Uncertainty, AI Unease Rattle Tech Shares | Bloomberg Tech 2/23/2026

Next

Universal Medical Intelligence: OpenAI’s Plan to Elevate Human Health, with Karan Singhal

Universal Medical Intelligence: OpenAI’s Plan to Elevate Human Health, with Karan Singhal

18 Related Posts

Related Posts

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

22 hours ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

22 hours ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

22 hours ago

18:37

Foundation Models

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

2 days ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago