Fast Models Need Slow Developers — Sarah Chieng, Cerebras

Fast Models Need Slow Developers — Sarah Chieng, Cerebras

More

Descriptions:

Sarah Chang, Head of Developer Experience at Cerebras, argues that the arrival of ultra-fast AI coding models demands a fundamental rethink of how developers work — and delivers a practical playbook to get there. The centerpiece is Codex Spark, co-released by Cerebras and OpenAI, which generates code at 1,200 tokens per second — roughly 20 times faster than Claude Sonnet, Opus, or GPT-4o, which top out around 40–60 tokens per second.

Chang’s core thesis is that habits formed during the slow-inference era — writing massive one-shot prompts, running 500-agent swarms, making enormous commits — will simply produce bad code faster unless developers change their behavior. Her playbook centers on a tiered orchestration model: use a high-intelligence model like GPT-5.4 for planning and long-horizon tasks, then delegate execution to Codex Spark; capture successful agent trajectories as reusable, verifiable skills; and resist unverified parallel swarms that accelerate technical debt accumulation.

The talk also explains why this speed shift is structural rather than temporary. Chang covers the hardware reasons behind faster inference — Cerebras’s wafer-scale SRAM architecture (eliminating off-chip HBM bottlenecks), disaggregated prefill/decode inference (now commercially viable, which she links to Nvidia’s $20 billion Groq acquisition), and ongoing optimization across the full inference stack. For developers preparing for a world where models generate code faster than humans can review it, this session offers both the mental model and the concrete workflow changes needed to stay in control.


📺 Source: AI Engineer · Published May 22, 2026
🏷️ Format: Hands On Build

1 Item

Channels

2 Items

Companies