[State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, “Pragmatic” Interp — Goodfire

Foundation Models5 months ago

[State of MechInterp] SAEs in Production, Circuit Tracing, AI4Science, “Pragmatic” Interp — Goodfire

Descriptions:

Mark and Jack from Goodfire, an AI interpretability research company, join Latent Space’s year-end State of the Field series to survey where mechanistic interpretability stands as both a research discipline and a production engineering tool. Jack, a recent PhD graduate who shifted from language model grounding research to interpretability, and Mark, who came from Palantir’s healthcare team, represent Goodfire’s dual focus: foundational research and applied platform development.

The most concrete production example is a deployment with partner Racketin: rather than using an LLM-as-judge to detect personally identifiable information in customer-agent chat transcripts, Goodfire routes transcripts through a “sidecar model” and monitors when PII-related features activate in the model’s internal representations. The result is recall equivalent to GPT-5-as-judge at roughly 500 times lower cost — a compelling demonstration that interpretability techniques can beat prompt-based approaches on both quality and economics in the right setting.

The conversation also covers Goodfire’s paint.goodfire.ai demo (using unsupervised sparse autoencoder features to enable direct concept-space painting inside Stable Diffusion XL Turbo), Anthropic’s circuit tracing paper, the science of how models memorize training data and what that means for privacy, and early work applying interpretability to narrowly superhuman scientific models in genomics, proteomics, and materials science — domains where the models are superhuman but completely opaque, making interpretability tools uniquely valuable.

📺 Source: Latent Space · Published December 31, 2025
🏷️ Format: Deep Dive

Tags

Anthropic Claude DeepMind Gemini Goodfire Palantir

Prev

I was using Claude Code wrong… then I discovered this

I was using Claude Code wrong… then I discovered this

Next

How I Added Vector Search to my Course (Postgres + PGVector )

How I Added Vector Search to my Course (Postgres + PGVector )

18 Related Posts

Related Posts

16:23

Foundation Models

Your SaaS Bill Just Got a Second Meter. You’re About to Pay It.

1 hour ago

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

1 day ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

1 day ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

1 day ago

18:37

Foundation Models

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

2 days ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago