AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan

Interviews2 weeks ago

AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan

Descriptions:

Zico Kolter and Matt Fredrikson — CMU professors and co-founders of AI security startup Gray Swan AI — join the Latent Space podcast to discuss the security landscape for AI agents in the era of widely deployed tools like Codex and Claude Code. The conversation establishes a key framing: AI systems have fundamentally different vulnerability profiles than traditional software. Models can be manipulated in ways analogous to social engineering, and because a small number of foundation models underpin most production deployments, a single discovered exploit can scale across an enormous attack surface simultaneously.

Gray Swan operates on both sides of this problem. Their automated red teaming system, SHADE, now outperforms human red teamers at breaking models — finding jailbreaks and policy violations faster, at greater scale, and with less human involvement. Kolter makes a counterintuitive point: model scale alone does not improve adversarial robustness. Making a model bigger does not make it harder to jailbreak; explicit adversarial training is required, and it must stay current as new attack techniques emerge.

The defensive side is CYGNAL (stylized Signal), a purpose-built filter model that sits between users, LLMs, and tool calls to detect policy violations in real time. Fredrikson explains that the red teaming capability is what makes CYGNAL effective — the same attack scenarios used to find vulnerabilities are used to train the defense. The episode digs into indirect prompt injection as a growing threat vector for agentic systems with tool access, and discusses why Gray Swan’s Series A — backed in part by Snowflake — positions them at the intersection of enterprise AI deployment and security infrastructure.

📺 Source: Latent Space · Published June 22, 2026
🏷️ Format: Interview

Tags

Amazon Anthropic Claude Code Claude Mythos Claude Opus 4.7 Codex OpenClaw Signal Snowflake

Prev

Ponytail + OpenClaw + Ollama: 20K Tokens to 2K Tokens – Don’t Overbuild

Next

How to Generate 2+ Minute AI Videos: JoyAI-Echo Complete Guide|Lossless vs. Lite ComfyUI Workflow:

18 Related Posts

Related Posts

02:00:20

Interviews

Claude Fable 5 Is BACK (And It’s Different)

2 days ago

01:18:07

Interviews

Coinbase Cuts AI Spend by 50% | Kalshi’s $40B Valuation & Impending IPO | The Year for SaaS Roll-Ups

2 days ago

44:07

Interviews

Tesla Deliveries Jump 25% | Bloomberg Tech 7/02/2026

2 days ago

05:14

Interviews

Nuclear Reactor Powers Nvidia AI Chip in US First

2 days ago

07:36

Interviews

Microsoft Shifts Strategy on Enterprise AI

2 days ago

01:24:35

Interviews

ARC-AGI-3 Explained by the Team That’s Winning It

3 days ago