Qwen3.5 + Claude Code: Run a Free Local AI Coding Agent

Tutorials2 months ago

Qwen3.5 + Claude Code: Run a Free Local AI Coding Agent

Descriptions:

Fahd Mirza walks through building a fully offline, private AI coding agent by combining Anthropic’s Claude Code with the Qwen3.5 4-billion-parameter model in a Q4KM quantized format, all served locally via llama.cpp. The tutorial covers cloning and compiling llama.cpp from source, downloading the GGUF model from Hugging Face using the snapshot download command, and pointing Claude Code at the local llama.cpp server endpoint — no Anthropic API key or internet connection required once everything is set up.

The demo runs on an Ubuntu system with an Nvidia GPU and shows the model consuming just under 6 GB of VRAM. Mirza demonstrates giving Claude Code a prompt to generate Python files and test suites, walking through the approval workflow and showing the completed output. He openly notes that a 4B Q4KM model is not an ideal agentic model for production use, but frames the exercise as proof-of-concept validation that Claude Code can be driven by any OpenAI-compatible local endpoint.

This video is particularly useful for developers who want to experiment with Claude Code’s agentic loop without incurring API costs or sending code to external servers. The steps are concise and reproducible, making it a practical starting point for anyone already comfortable with llama.cpp and Hugging Face tooling.

📺 Source: Fahd Mirza · Published March 14, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

Fahd Mirza

1 Item

Companies

No Image Available

Anthropic

Tags

Anthropic Claude Code Fahd Mirza llama.cpp Qwen 3.5 Qwen 3.5 27B

Prev

I Tried the AI Coding Tool That Could Replace Cursor

I Tried the AI Coding Tool That Could Replace Cursor

Next

Inside Ramp, the $32B Company Where AI Agents Run Everything | Geoff Charles

Inside Ramp, the $32B Company Where AI Agents Run Everything | Geoff Charles

18 Related Posts

Related Posts

14:22

Tutorials

Codex Mobile Released and It’s Insane

7 minutes ago

08:17

Tutorials

OpenAI Codex Now Works from Anywhere (Dispatch Killer?)

1 day ago

10:54

Tutorials

Talkie: I Ran a 1930 AI Model Locally and Talked to People from the Past

1 day ago

03:02

Tutorials

Installing Claude Code

1 day ago

08:41

Tutorials

Luce DFlash Meets OpenClaw – Local AI Agents at 2x Speed with Qwen3.6-27B

2 days ago

24:07

Tutorials

Hermes Agent powered by local models on the DGX Spark is basically magic

2 days ago