Qwen3.5 + Claude Code: Run a Free Local AI Coding Agent

Qwen3.5 + Claude Code: Run a Free Local AI Coding Agent

More

Descriptions:

Fahd Mirza walks through building a fully offline, private AI coding agent by combining Anthropic’s Claude Code with the Qwen3.5 4-billion-parameter model in a Q4KM quantized format, all served locally via llama.cpp. The tutorial covers cloning and compiling llama.cpp from source, downloading the GGUF model from Hugging Face using the snapshot download command, and pointing Claude Code at the local llama.cpp server endpoint — no Anthropic API key or internet connection required once everything is set up.

The demo runs on an Ubuntu system with an Nvidia GPU and shows the model consuming just under 6 GB of VRAM. Mirza demonstrates giving Claude Code a prompt to generate Python files and test suites, walking through the approval workflow and showing the completed output. He openly notes that a 4B Q4KM model is not an ideal agentic model for production use, but frames the exercise as proof-of-concept validation that Claude Code can be driven by any OpenAI-compatible local endpoint.

This video is particularly useful for developers who want to experiment with Claude Code’s agentic loop without incurring API costs or sending code to external servers. The steps are concise and reproducible, making it a practical starting point for anyone already comfortable with llama.cpp and Hugging Face tooling.


📺 Source: Fahd Mirza · Published March 14, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

1 Item

Companies