Measuring Exponential Trends Rising (in AI) — Joel Becker, METR

Interviews3 months ago

Measuring Exponential Trends Rising (in AI) — Joel Becker, METR

Descriptions:

Joel Becker from METR (Model Evaluation and Threat Research) joins Latent Space to discuss his organization’s work quantifying AI capabilities and assessing catastrophic risk. METR’s model time horizon chart—measuring how long AI agents can sustain autonomous task completion across successive model generations—has become one of the most cited visuals in both AI investment memos and policy discussions, and Becker provides rare context on its origins, methodology, and limitations.

Becker explains METR’s three-part framework: capabilities measurement (what models can do under controlled conditions), propensities assessment (what they actually do in the wild), and threat research (connecting both to specific catastrophic risk scenarios). He notes that the organization has shifted emphasis from autonomous replication threats toward AI R&D acceleration—the risk that automated research inside a major lab could trigger a capabilities explosion—as the primary scenario worth stress-testing. The episode also references METR’s published safety evaluations for GPT-5 and GPT-5.1, which concluded neither model currently meets the capability threshold for catastrophic misuse, while being explicit about the reasoning behind that conclusion.

A significant portion of the conversation explores whether AI capability growth will remain continuous or exhibit phase-transition discontinuities, drawing on physics analogies and METR’s own data showing surprising smoothness across model generations so far. Becker engages directly with the question of what would actually change his mind—pointing to full automation of AI R&D inside a top lab as the clearest signal. The episode also covers METR’s RCT methodology for measuring developer productivity and the underappreciated challenge of designing evaluation tasks that remain valid as models improve.

📺 Source: Latent Space · Published February 27, 2026
🏷️ Format: Podcast

Tags

Claude Opus 4.5 GPT-5 METR OpenAI

Prev

NeuTTS Nano Multilingual: Great Idea, Disappointing Execution

NeuTTS Nano Multilingual: Great Idea, Disappointing Execution

Next

Which AI Video Generator Should You Buy?

Which AI Video Generator Should You Buy?

18 Related Posts

Related Posts

08:44

Interviews

AI Chipmaker Cerebras Raises $5.55 Billion in Year’s Biggest IPO

1 day ago

01:06:38

Interviews

Inside Abridge: The AI Listening to 100 Million Doctor Visits — Abridge’s Janie Lee & Chai Asawa

1 day ago

16:39

Interviews

How Emergent is making app building more accessible with Claude

2 days ago

01:16:02

Interviews

TypeScript, C# and Turbo Pascal with Anders Hejlsberg

2 days ago

23:34

Interviews

The Founders Who Left Tesla to Rebuild America | a16z

2 days ago

46:56

Interviews

“There Is No Task Agents Cannot Do” – Magnus Müller

2 days ago