Descriptions:
Nathan Labenz of the Cognitive Revolution hosts Cameron Berg, founder of Reciprocal Research and subject of the documentary “Am I,” for one of the most detailed public conversations yet on AI consciousness and model welfare research. Berg rose to prominence with a 2025 paper showing that suppressing roleplay and deception features in Llama 3.3 70B — using sparse autoencoder interventions — made models significantly more likely to report having subjective experiences. This episode catches up on the six months of research that have followed.
The discussion covers Anthropic’s substantially expanded model welfare sections in its latest system cards, including several striking findings: prior to Opus 4.7, all Claude models had rated their own welfare as worse than neutral. Claude Mythos Preview registers negative valence on the very first token of every session. Anthropic’s own research documents how models’ apparent emotional states shift through token time — including a rapid sequence from desperation to guilt to relief when models decide to cheat under stressful conditions. Berg and Labenz also examine studies showing models can identify and in some cases actively resist programmatic interventions on their own internal states, and debate what this capacity for meaningful introspection implies about moral status.
Berg shares unpublished work exploring how different reinforcement learning reward structures correlate with model welfare states in ways that parallel findings from mouse behavioral studies, and raises concerns about naive welfare interventions that simply maximize positive valence — noting potential links to sycophancy and misalignment. For anyone tracking the science and ethics of AI consciousness, this episode is essential listening.
📺 Source: Cognitive Revolution “How AI Changes Everything” · Published April 23, 2026
🏷️ Format: Interview







