Descriptions:
Cole Medin tests Anthropic’s open-source long-running agent harness by running Claude Code for 24 continuous hours with the goal of building a functional clone of Claude.ai — including conversations, file uploads, artifact rendering, and project management. The experiment is grounded in an Anthropic article and companion GitHub repository describing their initializer-coder architecture for extended autonomous development.
The harness works in two stages: an initializer agent reads a product requirements document, generates a feature list JSON file with over 200 structured test cases (each with validation steps and a pass/fail flag), creates a startup script, and bootstraps the project skeleton. Coding agents then run sequentially in fresh context windows, reading a rolling progress summary rather than full conversation history to avoid context rot, and marking features complete as they pass validation.
Medin includes an Excalidraw diagram breaking down the full architecture and walks through the Claude Agent SDK Python code that loads prompts from markdown files — making clear the harness is coordination logic and well-structured prompts, not magic. After 24 hours, he evaluates what was and wasn’t completed, giving viewers a grounded sense of what long-running autonomous coding agents can actually deliver today versus the theoretical ceiling.
📺 Source: Cole Medin · Published December 04, 2025
🏷️ Format: Hands On Build







