Descriptions:
Ben AI demonstrates how to adapt Andrej Karpathy’s Auto Research framework—originally developed for machine learning pipeline optimization—for use with Claude Code and Claude Cowork to create self-improving AI agents applied to real content and workflow tasks. Use cases include LinkedIn post writing skills, newsletter subject line optimization, landing page copy, and CLAUDE.md knowledge routing.
The framework operates as an autonomous loop: a main orchestrator agent proposes a hypothesis for improvement, a sub-agent runs a blind test using the updated skill or prompt, and a separate evaluation layer scores the result. Evaluation can be deterministic (a Python script checking a binary condition) or handled by an LLM judge sub-agent when the criterion is too nuanced for code. If the change improves the baseline, it is kept; otherwise it is discarded. The loop continues until a target score is reached or a maximum iteration count is hit.
Concrete results shown include a LinkedIn writing skill improving from 80% to 100% compliance across two custom criteria in five autonomous iterations, a more complex multi-criteria optimization going from a 68% baseline to a 27% improvement over ten iterations, and a CLAUDE.md routing optimization achieving a 9.9% gain in five iterations. The video emphasizes that optimization quality is bounded by criterion precision—criteria must produce a true/false result, specifying exact conditions like character counts or named formats rather than vague quality goals.
📺 Source: Ben AI · Published April 07, 2026
🏷️ Format: Workflow Case Study







