Stop Fixing Your Claude Skills. Autoresearch Does It For You

Stop Fixing Your Claude Skills. Autoresearch Does It For You

More

Descriptions:

Nick Saraev demonstrates how to apply Andrej Karpathy’s recently released Auto Research GitHub repository to create self-improving Claude Code skills. The core problem: Claude Code skills — reusable prompt modules — produce inconsistent output, with Saraev estimating a roughly 30% failure rate on his own. Rather than manually debugging prompts, the auto-research approach runs a skill repeatedly against a standardized evaluation set and lets an agent iteratively refine the prompt until measured performance improves.

The method maps directly from Karpathy’s nanoGPT auto-research structure: the `train.py` file maps to a skill markdown, and `program.md` becomes the agent’s instruction prompt. The critical ingredient is an objective metric — a set of binary yes/no evaluation questions that remove subjective judgment. Saraev walks through a concrete example improving a diagram-generator skill against four criteria: text legibility, adherence to a pastel color palette, linear left-to-right layout, and absence of numbered ordering.

The same methodology applied to website optimization reduced page load time from 1,100 milliseconds to 67 milliseconds across 67 test iterations, illustrating that the technique extends well beyond prompt tuning to any repeatable process with a measurable output. The full workflow runs inside an IDE called Anti-Gravity. Saraev also notes that accumulated research logs from these runs become a durable asset — transferable to future, more capable models like GPT-6 or Claude Opus 5 to continue where previous iterations left off.


📺 Source: Nick Saraev · Published March 13, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

2 Items

People