Are AI Coding Skills Just Hype? I Tested Them

Are AI Coding Skills Just Hype? I Tested Them

More

Descriptions:

Web Dev Cody tackles a question most developers using agentic coding tools have avoided: do AI “skills” — instructional prompt files that guide agents like Claude Code — measurably improve output quality, or are they just bloating the context window? Rather than relying on intuition, Cody builds a custom skill-testing harness that spins up three parallel Claude Code sessions against the same feature prompt, each in an isolated git worktree to prevent cross-contamination: one session with no skills, one using plan mode only, and one running his hand-crafted skills from agentsystem.dev.

The benchmark task is shipping a token usage tracking and visualization feature inside a real project called Mission Control. Using Claude Opus 4.7 on low-effort mode, the no-skills session finishes in roughly four minutes but produces an incomplete implementation. The plan-mode-only session takes approximately eight minutes and generates more deliberate output. The agentsystem.dev skills session outperforms both, with a bar graph generated from AI-assisted diff analysis showing consistent quality improvement across runs.

Key takeaways include why AB-testing any skill against a baseline is essential before relying on it in production, how effort levels trade off against token consumption and accuracy, and why low-effort mode with good skills can outperform high-effort mode without them. Cody also distinguishes his curated agentsystem.dev skill library — which he reviews and quality-tests — from the many AI-generated, unaudited skills circulating on platforms like skills.sh.


📺 Source: Web Dev Cody · Published May 02, 2026
🏷️ Format: Benchmark Test

1 Item

Channels

1 Item

People