Descriptions:
Aparna Dhinakaran from Arize AI presents a framework called “prompt learning” — a lightweight alternative to reinforcement learning for improving AI coding agents. Drawing on Andrej Karpathy’s concept of system prompt learning, she argues that iteratively refining system prompts with LLM-as-judge feedback is more practical for most teams than full RL training, which demands large datasets and dedicated data science resources.
The talk demonstrates the technique applied to two coding agents: Cline (running Claude Sonnet 4.5) and Claude Code. Starting from vanilla SWEBench baselines — 30% GitHub issue resolution for Cline and 40% for Claude Code — the team ran a multi-step pipeline: execute the agent on coding tasks, evaluate outputs with an LLM judge that generates natural-language explanations of failures, then route those explanations through a meta-prompt to produce updated rules appended to the agent’s CLAUDE.md or Cline rules file. No fine-tuning required.
Dhinakaran contrasts this with RL’s scalar reward signal (an exam grade with no feedback), arguing that rich English-language explanations make prompt learning far more sample-efficient. The talk also references the viral leak of Claude’s full system prompt to underscore how seriously frontier labs treat prompt engineering as a competitive differentiator — and why teams building on top of these models should too.
📺 Source: AI Engineer · Published December 23, 2025
🏷️ Format: Deep Dive







