Descriptions:
Sunil Pai, who builds agents at Cloudflare, presented “Code Mode” at the AI Engineer conference — a technique that replaces conventional JSON tool-calling with model-generated executable code, dramatically reducing the token overhead of large API surfaces and eliminating the latency penalty of sequential round-trips.
The anchor example is Cloudflare’s own API, which spans approximately 2,600 endpoints — enough to consume 1.2 to 1.5 million tokens if exposed as individual MCP tools. Pai’s colleague Matt Carey instead exposed just two tool calls, search and execute, both accepting code strings as inputs. The model generates JavaScript that runs directly against the API in a single pass, compressing context usage by roughly 99.9% to around 1,000 tokens. Pai demonstrated this live with a DDoS-response scenario: rather than eight sequential API round-trips, the agent generated and executed a single script to identify and block all offending IPs in one shot.
Pai also explored more conceptually unusual territory. Cloudflare Workers creator Kenton Varda built a whiteboard canvas and asked Claude Opus to play tic-tac-toe by reading the canvas’s raw stroke array — no game logic anywhere in the codebase. The model inferred the board state, identified Kenton’s move, and drew a response circle directly onto the canvas. Pai describes this shift as moving from “generating a program” to “inhabiting a state machine,” suggesting that code mode enables a class of emergent agent behavior that pure tool-calling architectures structurally cannot reach.
📺 Source: AI Engineer · Published April 19, 2026
🏷️ Format: Deep Dive







