AI Dev 26 x SF | Erik Thorelli: Deploying AI Code Review at Scale

Agents & Automation2 months ago

AI Dev 26 x SF | Erik Thorelli: Deploying AI Code Review at Scale

Descriptions:

Erik Thorelli from CodeRabbit presents a detailed, practitioner-level breakdown of what it actually takes to deploy AI-driven code review at production scale, drawing on CodeRabbit’s real-world architecture and the hard lessons learned building it. The talk is structured around a central thesis: as AI code generation removes the historical bottleneck of writing code, code review has become the new bottleneck—and it requires its own agentic solution.

Thorelli opens with striking internal data: code written predominantly or exclusively by AI shows roughly a 40% increase in critical bugs and a 70% increase in bugs overall compared to human-written baselines. He references Cursor’s public disclosure of over one billion lines of AI-generated code as context for the scale of the problem, and cites the widely-used $5 million per hour figure for production downtime to ground the business stakes.

The technical meat of the talk covers CodeRabbit’s eval architecture, which distinguishes between offline evals (run against fixed datasets), shadow evals (run in parallel with production but not surfaced to users), and online evals (live, user-feedback-driven). Thorelli argues that treating every system change as a hypothesis and maintaining continuous deployment pipelines is a competitive advantage in probabilistic AI systems—because shipping 50 changes at once makes regression bisection nearly impossible. He also warns against over-relying on public benchmarks like SWE-bench, advocating for domain-specific internal evals instead. The talk includes a live interactive eval segment with the audience, making it both entertaining and instructive for engineering teams building or evaluating AI-assisted development pipelines.

📺 Source: DeepLearningAI · Published May 21, 2026
🏷️ Format: Workflow Case Study

1 Item

Channels

No Image Available

DeepLearningAI

Tags

Anthropic Claude CodeRabbit Cursor Dispatch GPT-4 GPT-5 OpenAI SWE-bench

Prev

AI Dev 26 x SF | Eda Zhou & Mahdi Ghodsi: Building Personal AI Agents with Open Source Models

Next

DeepSeek’s New AI Is A Game Changer

18 Related Posts

Related Posts

09:21

Agents & Automation

Loop engineer practice #1: Reddit loop grew 0 to 95 Karma in 7 days

1 hour ago

22:29

Agents & Automation

My $100K ARR App Costs $4,198/Month to Run (Full Breakdown)

1 day ago

33:39

Agents & Automation

AI Agents for Performance: Ship Faster, Pay Less — Rajat Shah, Netflix

2 days ago

21:16

Agents & Automation

You Can Hand One AI Agent Your Worst Recurring Task. It Cleared 60% Of Mine.

4 days ago

21:39

Agents & Automation

Building Closed-Loop Evals for a Multimodal Agent at Scale — Soumya Gupta & Jai Chopra, Uber

6 days ago

19:38

Agents & Automation

Learned Execution Graphs for Anomaly Detection & Drift in APIs — Ritvik Pandya, JP Morgan Chase

1 week ago