Pi Coding Agent Observability: HTML Specs with Gemini 3.5 Flash and GPT Image 2

Pi Coding Agent Observability: HTML Specs with Gemini 3.5 Flash and GPT Image 2

More

Descriptions:

IndyDevDan, an engineer with 15 years of experience, runs a structured comparison of three specification formats for AI coding agents — markdown, HTML, and an enhanced ‘VSpec’ that incorporates visual components — using three parallel Gemini 3.5 Flash agents instrumented with a custom PI observability dashboard. The experiment is motivated by Anthropic’s viral post on the ‘unreasonable effectiveness of HTML’ and OpenAI’s GPT Image 2, asking whether denser, richer specs actually produce better agent behavior when cost and speed are factored in.

Each agent receives the same underlying task: build a planning spec for a ‘Steelman’ product agent that generates UI-backed counter-arguments for investment theses. The observability dashboard streams every event, tool call, and turn from all three agents in real time, making it possible to compare not just final output quality but the internal reasoning process. Results show the markdown agent completed its planning phase in 29 turns, the HTML agent in 25 turns, and the enhanced HTML agent in just 17 turns — though token totals were counterintuitively higher for the markdown agent, suggesting it explored the codebase more thoroughly.

The second half of the video demonstrates the product agent live, generating a bear-case steelman for Apple as an AI distribution play, complete with dynamically generated UI components including a pie chart of Mac Mini revenue versus Apple’s broader product lines. Witteveen’s key thesis is that agent observability is not optional infrastructure — it is the only way to understand why different prompts produce different behaviors at scale.


📺 Source: IndyDevDan · Published June 01, 2026
🏷️ Format: Benchmark Test

1 Item

Channels

1 Item

People