Descriptions:
Erik Thorelli from CodeRabbit presents a detailed, practitioner-level breakdown of what it actually takes to deploy AI-driven code review at production scale, drawing on CodeRabbit’s real-world architecture and the hard lessons learned building it. The talk is structured around a central thesis: as AI code generation removes the historical bottleneck of writing code, code review has become the new bottleneck—and it requires its own agentic solution.
Thorelli opens with striking internal data: code written predominantly or exclusively by AI shows roughly a 40% increase in critical bugs and a 70% increase in bugs overall compared to human-written baselines. He references Cursor’s public disclosure of over one billion lines of AI-generated code as context for the scale of the problem, and cites the widely-used $5 million per hour figure for production downtime to ground the business stakes.
The technical meat of the talk covers CodeRabbit’s eval architecture, which distinguishes between offline evals (run against fixed datasets), shadow evals (run in parallel with production but not surfaced to users), and online evals (live, user-feedback-driven). Thorelli argues that treating every system change as a hypothesis and maintaining continuous deployment pipelines is a competitive advantage in probabilistic AI systems—because shipping 50 changes at once makes regression bisection nearly impossible. He also warns against over-relying on public benchmarks like SWE-bench, advocating for domain-specific internal evals instead. The talk includes a live interactive eval segment with the audience, making it both entertaining and instructive for engineering teams building or evaluating AI-assisted development pipelines.
📺 Source: DeepLearningAI · Published May 21, 2026
🏷️ Format: Workflow Case Study







