Qwen 3.5 – The next NEXT model

Qwen 3.5 – The next NEXT model

More

Descriptions:

Sam Witteveen breaks down Qwen 3.5, the latest flagship from Alibaba’s Qwen team, a 397-billion-parameter mixture-of-experts model with only 17 billion parameters active at inference time. The headline number is a claimed 19x decoding speed improvement over the previous Qwen Max at 256k context length — and even a 7.2x speed advantage over the much smaller Qwen 3 235B model. Witteveen traces these gains to two architectural changes: an attention mechanism redesigned for long-context efficiency and a shift from single-token to multi-token prediction during pre-training.

Beyond speed, the video highlights that Qwen 3.5 is natively multimodal — trained from scratch on text and images rather than bolting a vision encoder onto an existing language model. This makes its vision benchmark performance meaningfully stronger than prior Qwen VL releases, and Witteveen places it competitive with GPT-5.2 and above Claude Opus 4.5 on several multimodal tasks. The model also scales multilingual support from 119 to over 200 languages, backed by a new 250K-token vocabulary that improves tokenization efficiency for non-Western scripts.

Witteveen also discusses the 512-expert architecture (up from 128 in Qwen 3) and speculates on what the team’s aggressive RL training environment scaling — roughly 15,000 environments — signals about the direction of future Qwen releases. For teams evaluating open-weight alternatives to proprietary frontier models, this video provides a substantive technical reference point.


📺 Source: Sam Witteveen · Published February 17, 2026
🏷️ Format: Deep Dive

1 Item

Channels