Descriptions:
Fahd Mirza installs and tests VibeThinker-3B — a reasoning model released by Weibo, the Chinese social media giant — directly on an NVIDIA RTX A6000 GPU with 48GB VRAM, serving it via SGLang and running a gauntlet of real-world tasks. The headline claim is striking: a 3-billion parameter model posting benchmark scores alongside Claude Opus 4.5, Gemini 3 Pro, and Qwen 2.5, a one-trillion parameter model from Alibaba, on verifiable math and coding tasks.
The video walks through the model’s four-stage post-training pipeline built on top of Qwen 2.5 Coder 3B: supervised fine-tuning in two stages, reinforcement learning across math, code, and STEM using a novel algorithm called MGPO (Max and Guided Policy Optimization), offline self-distillation where the best reasoning traces are fed back into the model, and a final instruction-following RL stage. The “spectrum-to-signal” principle focuses training on problems the model currently gets right about 50% of the time — avoiding both trivially easy and impossibly hard samples. The model consumes just under 8GB of VRAM for weights alone.
Results are mixed in an instructive way. VibeThinker-3B correctly solves a Voyager signal-travel-time calculation with perfect step-by-step arithmetic and produces a working animated fish simulation in a single HTML file. However, a paleoanthropology question exposes a clear limitation: the math is flawless but the scientific interpretation is wrong — the model confuses fossil age with migration timing. Mirza is explicit that the model does not replace flagship reasoning models and that small-model reasoning errors require careful verification.
📺 Source: Fahd Mirza · Published June 16, 2026
🏷️ Format: Hands On Build







