Descriptions:
Nate B Jones addresses CTOs and engineering leads who are building production systems on Claude Code and the Anthropic API, arguing that the foundational failure in most AI projects is not model quality or architecture choice — it’s the inability to define what ‘correct’ means before picking tools. The video’s central thesis: correctness is upstream of everything, and teams that skip this step build on a shifting target, changing definitions mid-stream and blaming hallucinations on the model rather than on the reward signals they themselves designed.
The technical argument draws on Goodhart’s Law applied to reinforcement learning: proxy metrics for correctness become targets the model learns to satisfy, often diverging from actual intent. Gemini 3’s single-turn optimization at the expense of multi-turn coherence is presented as a concrete fingerprint of this dynamic — the model reflects the training signal it received, not a failure of intelligence. Jones references OpenAI’s published guidance on evaluations and an Anthropic paper on reward hacking to ground the argument in named, citable sources.
The practical framework covers how to structure correctness definitions as multi-dimensional bundles (truthfulness, completeness, tone, policy compliance, cost, auditability) before making architecture decisions about RAG, agent count, or context engineering. For agentic systems combining structured and unstructured data, Jones addresses the human-in-the-loop calibration problem: when to trust the system over the human operator, and how to have organizational change management conversations when those decisions need to shift. The video closes with a note on hallucination causation — framing overconfident output as a systemic response to ambiguous correctness incentives rather than a model defect.
📺 Source: AI News & Strategy Daily | Nate B Jones · Published December 16, 2025
🏷️ Format: Deep Dive







