Descriptions:
Ben AI walks through Skills 2.0, Anthropic’s update to the skill creator system in Claude Cowork and Claude Code, focusing on its newly built-in evaluation and benchmarking capabilities that allow practitioners to iterate on automations significantly faster than before.
Skills in Claude are Markdown-based instruction files that tell Claude what process to follow for a given task—covering everything from step sequences to human-in-the-loop checkpoints and output format. What Skills 2.0 adds is an automated eval loop: after a skill is created, Claude can spin up multiple sub-agents in parallel to run simultaneous test cases, score outputs against user-specified criteria (such as word count ranges, writing style elements, or structural requirements), and generate a structured report across all test runs. Ben demonstrates this with a YouTube-to-newsletter repurposing skill, showing how three parallel test agents each process a different video and return scored results. He also covers AB testing between skill variants to identify which version performs better on specific criteria.
A central practical lesson is that users should define their evaluation criteria explicitly before running evals—relying on Claude’s auto-generated benchmarks often misses the nuances of a specific use case. Ben also explains a self-learning prompt pattern where skills automatically update their own instructions when a user flags something that should not be repeated. For teams running automations across sales, marketing, and operations, the Skills 2.0 eval loop turns what was previously a slow, manual iteration process into a structured, data-driven optimization cycle.
📺 Source: Ben AI · Published March 09, 2026
🏷️ Format: Tutorial Demo







