Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face

Your Coding Agent Should Do AI System Engineering — Ben Burtenshaw, Hugging Face

More

Descriptions:

Ben Burtenshaw, an engineer at Hugging Face, makes the case that coding agents have crossed a capability threshold where they can now tackle AI systems engineering problems—not just application-layer code. His talk at AI Engineer 2026 is structured around three progressively autonomous challenges, each framed as a boss fight: writing optimized CUDA kernels, autonomously fine-tuning LLMs, and running a multi-agent automated research lab.

On CUDA kernels, Burtenshaw demonstrates Hugging Face’s `kernels` library, which lets agents generate hardware-specific inference optimizations and benchmark them against a compatibility matrix. An agentic workflow targeting Qwen 3 8B on an H100 produced a 94% inference speedup—not state-of-the-art, but representative of low-hanging fruit available when models aren’t optimized for specific GPU generations. He also introduces `upskill`, an open-source evaluation tool that benchmarks multiple models (GPT, Kimi, Haiku) on the same structured task, enabling cost-aware model substitution without accuracy loss.

The second section shows how a single natural-language prompt can trigger a full LLM fine-tuning run on Hugging Face Hub infrastructure, using either standard HF CLI skills or the Onslaught framework for cheaper compute. The talk closes with AutoLab—a multi-agent research system inspired by Andrej Karpathy’s Auto Research project—which automates hypothesis generation, experimentation, and evaluation in a continuous loop. Burtenshaw argues the prerequisite for all of this is standardized repositories on the Hub, giving agents reliable surfaces to act on.


📺 Source: AI Engineer · Published May 21, 2026
🏷️ Format: Hands On Build

1 Item

Channels