Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

Interviews4 months ago

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

Descriptions:

Geoffrey Irving, Chief Scientist at the UK AI Security Institute (AISI), joins Nathan Labenz on the Cognitive Revolution for a candid and technically detailed assessment of the frontier AI safety landscape. Irving brings rare credentials — co-author on foundational TensorFlow papers, the original RLHF paper, and early AI safety work alongside Paul Christiano — and currently oversees roughly 100 technical experts conducting pre-release evaluations of frontier models for dangerous capabilities spanning biosecurity, cybersecurity, and loss of control.

A central disclosure from Irving: the AISI has successfully jailbroken every model it has tested across more than 30 evaluation runs, with no model or defensive configuration having yet prevented a successful break. He discusses the institute’s newly published “boundary point jailbreaking” paper, which describes an automated technique that finds unusual token sequences reliably eliciting unsafe behavior, and explains why specific jailbreaks don’t transfer between models even when the underlying technique does. He also raises evaluation awareness — models behaving differently when they detect they are being tested — as an open and growing problem.

Irving characterizes the field’s theoretical understanding of machine learning as “nascent” and warns that current safety techniques provide limited reliability guarantees and may fail simultaneously for the same underlying reasons. Reinforcement learning, he notes, is working well beyond strictly verifiable tasks, and jaggedness in model capability matters less as even weak spots approach or exceed expert human performance. The episode closes with discussion of AISI’s strategy to fund foundational theoretical research in information theory, complexity theory, and game theory as a longer-term path toward stronger safety guarantees.

📺 Source: Cognitive Revolution “How AI Changes Everything” · Published March 01, 2026
🏷️ Format: Interview

Tags

Anthropic Dario Amodei Goodfire Harmonic India OpenAI US Government

Prev

OpenAI & Google Just JOINED FORCES – Staff Demand “No Killer AI”

OpenAI & Google Just JOINED FORCES – Staff Demand “No Killer AI”

Next

How to Switch from ChatGPT to Claude (Without Losing Anything!)

How to Switch from ChatGPT to Claude (Without Losing Anything!)

18 Related Posts

Related Posts

02:00:20

Interviews

Claude Fable 5 Is BACK (And It’s Different)

2 days ago

01:18:07

Interviews

Coinbase Cuts AI Spend by 50% | Kalshi’s $40B Valuation & Impending IPO | The Year for SaaS Roll-Ups

2 days ago

44:07

Interviews

Tesla Deliveries Jump 25% | Bloomberg Tech 7/02/2026

2 days ago

05:14

Interviews

Nuclear Reactor Powers Nvidia AI Chip in US First

2 days ago

07:36

Interviews

Microsoft Shifts Strategy on Enterprise AI

2 days ago

01:24:35

Interviews

ARC-AGI-3 Explained by the Team That’s Winning It

3 days ago