Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

Situational Awareness in Government, with UK AISI Chief Scientist Geoffrey Irving

More

Descriptions:

Geoffrey Irving, Chief Scientist at the UK AI Security Institute (AISI), joins Nathan Labenz on the Cognitive Revolution for a candid and technically detailed assessment of the frontier AI safety landscape. Irving brings rare credentials — co-author on foundational TensorFlow papers, the original RLHF paper, and early AI safety work alongside Paul Christiano — and currently oversees roughly 100 technical experts conducting pre-release evaluations of frontier models for dangerous capabilities spanning biosecurity, cybersecurity, and loss of control.

A central disclosure from Irving: the AISI has successfully jailbroken every model it has tested across more than 30 evaluation runs, with no model or defensive configuration having yet prevented a successful break. He discusses the institute’s newly published “boundary point jailbreaking” paper, which describes an automated technique that finds unusual token sequences reliably eliciting unsafe behavior, and explains why specific jailbreaks don’t transfer between models even when the underlying technique does. He also raises evaluation awareness — models behaving differently when they detect they are being tested — as an open and growing problem.

Irving characterizes the field’s theoretical understanding of machine learning as “nascent” and warns that current safety techniques provide limited reliability guarantees and may fail simultaneously for the same underlying reasons. Reinforcement learning, he notes, is working well beyond strictly verifiable tasks, and jaggedness in model capability matters less as even weak spots approach or exceed expert human performance. The episode closes with discussion of AISI’s strategy to fund foundational theoretical research in information theory, complexity theory, and game theory as a longer-term path toward stronger safety guarantees.


📺 Source: Cognitive Revolution “How AI Changes Everything” · Published March 01, 2026
🏷️ Format: Interview