Descriptions:
Anthropic has updated the Skill Creator for Claude Code, and Ray Amjad walks through every stage of the new eval-driven workflow for building, testing, and knowing when to retire skills. The video opens by diagnosing a real problem: skills written for older models can actively hurt performance after a model update, overriding capabilities the base model has since internalized — a regression that was previously invisible.
The tutorial introduces a two-category framework for skills. Capability uplift skills fill gaps in model knowledge — Anthropic’s own PDF form-filling skill is the benchmark example — and these carry an implicit retirement date once the base model catches up. Workflow and preference skills encode team-specific processes like NDA review checklists, multi-source weekly reports combining PostHog and Jira data, or release automation workflows.
The new Skill Creator handles the full development lifecycle: writing the skill, generating realistic test cases, spinning up parallel subagents (three with the skill, three without), and running a blind comparator that grades outputs without knowing which version used the skill. Amjad demonstrates this live by building an SEO audit skill, watching six parallel eval runs complete, and reviewing graded results. The video also covers how to use the same A/B framework when upgrading between Claude model versions — such as moving from Opus 4.6 to a future Opus 5 — to determine whether existing skills should be kept, revised, or deleted entirely.
📺 Source: Ray Amjad · Published March 04, 2026
🏷️ Format: Tutorial Demo







