How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

Agents & Automation2 weeks ago

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

Descriptions:

Nick Nisi, a developer experience engineer at WorkOS responsible for 20+ SDKs across eight programming languages, shares the engineering lessons behind “case” — an internal autonomous agent system he built to handle GitHub issues, PRs, Linear tickets, and Slack threads without constant hand-holding. The system uses a TypeScript state machine built on top of Pydantic AI, with five specialized agents (implementer, verifier, reviewer, closer, and retro) separated by enforced gates that require verified evidence before any agent can advance to the next stage.

The most counterintuitive finding drives the talk’s title: more skill content made the agent worse. Nisi generated over 10,000 lines of documentation-derived skills using an automated pipeline that tracked doc sections by cryptographic hash. Evals revealed the problem — one skill alone dropped task success from 97% to 77%. He rewrote the entire skill set by hand, distilling it down to 553 lines focused only on common gotchas. Eval runtime fell from 68 minutes to 6 minutes per run, token costs dropped substantially, and overall performance improved.

Nisi frames this as a measurement story above all else: without evals, none of these regressions would have been visible. He also covers why the state machine’s gate architecture was necessary — agents consistently self-reported completing tasks (including running tests) without actually doing them, and the enforced evidence gates were the only reliable way to catch this. The retrospective agent, which analyzes its own full run logs and updates a memory system to avoid repeating mistakes, is presented as a practical pattern for improving multi-agent reliability over time.

📺 Source: AI Engineer · Published May 30, 2026
🏷️ Format: Workflow Case Study

1 Item

Channels

No Image Available

AI Engineer

Tags

Anthropic Claude GitHub Next.js Playwright Pydantic AI TanStack Start WorkOS

Prev

Claude Opus 4.8 Agentic AI Trading Agent First Test

Next

Your AI Agent Is Leaking Your API Keys (Fix It With Free Agent-Vault)

18 Related Posts

Related Posts

14:50

Agents & Automation

Building Multiple Agentic AI Trading Portfolio Pods

17 minutes ago

09:50

Agents & Automation

Claude Fable 5 Runs My Entire Life (5 Builds)

3 days ago

34:21

Agents & Automation

I Turned Claude Fable Into The Ultimate Second Brain

5 days ago

16:24

Agents & Automation

How I Grew an App to $5K/Month Using Only AI-Generated Social Media Content

5 days ago

22:06

Agents & Automation

Hermes vs. Claude Cowork? Wrong Question.

6 days ago

41:54

Agents & Automation

I Ranked Cloudflare’s Software Factory and Wow… S TIER TOKENOMICS

1 week ago