Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell

Goodfire AI’s Bet: Interpretability as the Next Frontier of Model Design — Myra Deng & Mark Bissell

More

Descriptions:

Myra Deng and Mark Bissell from Goodfire AI join Latent Space to discuss the company’s mechanistic interpretability research and announce their $150 million Series B at a $1.25 billion valuation — making Goodfire one of the first interpretability-focused AI startups to reach unicorn status. The conversation covers what Goodfire means by interpretability: not merely a safety technique, but a broad methodology for understanding internal model representations and bringing that understanding directly into training.

A central technical focus is sparse autoencoders (SAEs), which Goodfire uses to decompose model activations into interpretable, human-understandable features. Bissell demonstrates real-time steering of a trillion-parameter model — adjusting feature vectors live during inference — and explains the tradeoffs between SAE-based unsupervised discovery and targeted probe-based approaches for specific behaviors like hallucination suppression. The team notes that SAEs are powerful for open-ended exploration, but supervised probes often outperform them when targeting a well-defined behavior.

The episode covers Goodfire’s production deployments in life sciences and healthcare, their ongoing hallucination detection research, and the broader thesis that interpretability will become the next frontier of model design — enabling developers to understand, debug, and customize AI behavior at a level that black-box fine-tuning cannot reach. For anyone tracking the commercialization of AI safety research or the emerging field of mechanistic interpretability, this episode offers a detailed inside look at one of the leading companies in the space.


📺 Source: Latent Space · Published February 05, 2026
🏷️ Format: Interview