This Mutant AI Model Should Not Exist: Qwopus-GLM-18B-Merged Locally

This Mutant AI Model Should Not Exist: Qwopus-GLM-18B-Merged Locally

More

Descriptions:

Fahd Mirza walks through the creation and live testing of Qwopus-GLM-18B-Merged, a community-built model that stitches together two separate 9-billion-parameter models — one optimized for coding and tool use, the other for structured reasoning — into a single 18-billion-parameter architecture. The merge was done without a research lab or significant compute budget, using a technique sometimes called “Franken-merging”: all 32 layers from each base model are stacked, and a short “heal fine-tune” of 1,000 training steps teaches the two halves to communicate across what would otherwise be a broken seam. Training loss dropped 39% through this process, and code output quality improved from garbled to production-quality.

The practical result is a 9.2 GB model that, according to benchmarks cited in the video, outperforms Qwen’s 35-billion-parameter model (22 GB) on tool calling, reasoning, code generation, and agentic tasks — at less than half the file size. Mirza runs the model locally using llama.cpp on Ubuntu with just over 14 GB of VRAM, making it accessible on a range of consumer GPUs.

Live tests include generating a Gray-Scott reaction-diffusion simulation in a single HTML file with JavaScript, and solving the Tsiolkovsky rocket equation with step-by-step reasoning. Both outputs hold up under scrutiny. The video is a useful demonstration of how open-source model merging can produce capable, efficient models without the resources of a major AI lab.


📺 Source: Fahd Mirza · Published April 26, 2026
🏷️ Format: Hands On Build

1 Item

Channels