How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

How I deleted 95% of my agent skills and got better results — Nick Nisi, WorkOS

More

Descriptions:

Nick Nisi, a developer experience engineer at WorkOS responsible for 20+ SDKs across eight programming languages, shares the engineering lessons behind “case” — an internal autonomous agent system he built to handle GitHub issues, PRs, Linear tickets, and Slack threads without constant hand-holding. The system uses a TypeScript state machine built on top of Pydantic AI, with five specialized agents (implementer, verifier, reviewer, closer, and retro) separated by enforced gates that require verified evidence before any agent can advance to the next stage.

The most counterintuitive finding drives the talk’s title: more skill content made the agent worse. Nisi generated over 10,000 lines of documentation-derived skills using an automated pipeline that tracked doc sections by cryptographic hash. Evals revealed the problem — one skill alone dropped task success from 97% to 77%. He rewrote the entire skill set by hand, distilling it down to 553 lines focused only on common gotchas. Eval runtime fell from 68 minutes to 6 minutes per run, token costs dropped substantially, and overall performance improved.

Nisi frames this as a measurement story above all else: without evals, none of these regressions would have been visible. He also covers why the state machine’s gate architecture was necessary — agents consistently self-reported completing tasks (including running tests) without actually doing them, and the enforced evidence gates were the only reliable way to catch this. The retrospective agent, which analyzes its own full run logs and updates a memory system to avoid repeating mistakes, is presented as a practical pattern for improving multi-agent reliability over time.


📺 Source: AI Engineer · Published May 30, 2026
🏷️ Format: Workflow Case Study

1 Item

Channels