Descriptions:
Published within hours of Alibaba’s Qwen 3.5 small model series going live, this video covers the smallest member of the family: the 0.8B parameter model. Fahd Mirza received an alert at 4 a.m. Sydney time via Telegram and immediately set up a local deployment using vLLM on Ubuntu with an Nvidia RTX 6000, walking through installation, model serving, and a range of capability tests.
Despite its tiny footprint — under 2GB on disk — the model consumes 44.264GB of VRAM when fully loaded at a 32K context window in full precision, though quantized versions are available for 8GB cards. The broader Qwen 3.5 small series (0.8B, 2B, 4B, 9B) shares a unified hybrid architecture combining gated Delta Networks with sparse mixture-of-experts and reinforcement learning, supports 201 languages natively, and ships under Apache 2.0 with base model weights included for fine-tuning.
Live testing is candid about limitations: one-shot HTML code generation is impressive (a fully animated Bitcoin mining terminal, no external dependencies), but multilingual number spelling across 50+ languages reveals clear weaknesses — most low-resource languages fail, while major European languages perform reasonably. For developers evaluating edge AI or lightweight agent foundations, the 0.8B offers a commercially open, multimodal starting point, though multilingual use cases should be validated carefully against target language families.
📺 Source: Fahd Mirza · Published March 02, 2026
🏷️ Format: Tutorial Demo







