Ollama Launch + Claude Code + GLM Flash

Ollama Launch + Claude Code + GLM Flash

More

Descriptions:

Sam Witteveen documents his weekend experiment running Claude Code locally using a newly shipped Ollama feature called Ollama Launch, which provides a streamlined path to connecting local models to Claude Code, OpenCode, Droid, and similar AI coding tools via the Anthropic API. The specific model under test is GLM 4.7 Flash—ZAI’s 30-billion-parameter mixture-of-experts model with 3 billion active parameters, roughly comparable in size to certain Qwen 3 MoE variants—which Witteveen runs on a Mac Mini Pro with 32GB of RAM.

The tutorial walks through the complete setup: updating Ollama, pulling the GLM 4.7 Flash model, and—critically—overriding the default 4,096-token context window to 64K via app settings. Witteveen explains that without this adjustment, Claude Code churns ineffectively, unable to maintain enough context for proper tool use or file operations. The `ollama launch claude` terminal command then launches the full Claude Code interface pointed at the local model.

After roughly 90 minutes of real-world testing, Witteveen’s verdict is measured: the setup works in principle, with MCP tool calls successfully picked up, but performance is noticeably slower than Anthropic’s hosted Opus model during both prefill and decoding phases, and tool argument errors appear more frequently—likely a consequence of quantization and constrained context. His conclusion is that Ollama Launch is a meaningful development for the local AI ecosystem, but not yet a viable daily-driver replacement for developers currently on Claude Code subscriptions. He suggests the approach may be better suited to building lightweight local agents, with future models like Gemma and Qwen 4 potentially improving feasibility.


📺 Source: Sam Witteveen · Published January 25, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels