AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan

AI Security After Codex and Claude Code — Zico Kolter & Matt Fredrikson, Gray Swan

More

Descriptions:

Zico Kolter and Matt Fredrikson — CMU professors and co-founders of AI security startup Gray Swan AI — join the Latent Space podcast to discuss the security landscape for AI agents in the era of widely deployed tools like Codex and Claude Code. The conversation establishes a key framing: AI systems have fundamentally different vulnerability profiles than traditional software. Models can be manipulated in ways analogous to social engineering, and because a small number of foundation models underpin most production deployments, a single discovered exploit can scale across an enormous attack surface simultaneously.

Gray Swan operates on both sides of this problem. Their automated red teaming system, SHADE, now outperforms human red teamers at breaking models — finding jailbreaks and policy violations faster, at greater scale, and with less human involvement. Kolter makes a counterintuitive point: model scale alone does not improve adversarial robustness. Making a model bigger does not make it harder to jailbreak; explicit adversarial training is required, and it must stay current as new attack techniques emerge.

The defensive side is CYGNAL (stylized Signal), a purpose-built filter model that sits between users, LLMs, and tool calls to detect policy violations in real time. Fredrikson explains that the red teaming capability is what makes CYGNAL effective — the same attack scenarios used to find vulnerabilities are used to train the defense. The episode digs into indirect prompt injection as a growing threat vector for agentic systems with tool access, and discusses why Gray Swan’s Series A — backed in part by Snowflake — positions them at the intersection of enterprise AI deployment and security infrastructure.


📺 Source: Latent Space · Published June 22, 2026
🏷️ Format: Interview