Descriptions:
Microsoft Research’s SkillOpt takes a different approach to improving AI agent performance: instead of fine-tuning model weights, it trains a “skill document” — a plain markdown file — using the same optimization loop as neural network training, complete with epochs, batch sizes, learning rates, and validation gates. The model itself never changes; only the instructions it receives evolve. Fahd Mirza walks through what this looks like in practice on a local Ubuntu system with an NVIDIA RTX A6000 (48GB VRAM).
The SkillOpt loop works in four steps: the target model runs a batch of tasks using the current skill document as context (rollout), an optimizer model analyzes failures and proposes patches (the backward pass), patches are aggregated and filtered down to a token budget (analogous to learning rate), and a held-out validation gate accepts or rejects the updated skill. Additional mechanisms include cross-epoch momentum updates to prevent forgetting and a meta-skill that helps the optimizer learn which edit types tend to generalize.
Mirza serves Qwen 3.5 4B locally via vLLM, uses ALFWorld (a text-based household task simulation benchmark) as the test environment, and runs a single-epoch training loop with a batch of four tasks to show a complete cycle. He also documents a disk-space error mid-run and resolves it live. SkillOpt is compatible with any OpenAI-compatible API — including Anthropic, OpenAI, and Azure — making it viable as a lightweight alternative to full fine-tuning for teams that need fast, cost-efficient skill improvement without touching model weights.
📺 Source: Fahd Mirza · Published June 22, 2026
🏷️ Format: Hands On Build







