Qwen 3.5 – The next NEXT model

Foundation Models3 months ago

Qwen 3.5 – The next NEXT model

Descriptions:

Sam Witteveen breaks down Qwen 3.5, the latest flagship from Alibaba’s Qwen team, a 397-billion-parameter mixture-of-experts model with only 17 billion parameters active at inference time. The headline number is a claimed 19x decoding speed improvement over the previous Qwen Max at 256k context length — and even a 7.2x speed advantage over the much smaller Qwen 3 235B model. Witteveen traces these gains to two architectural changes: an attention mechanism redesigned for long-context efficiency and a shift from single-token to multi-token prediction during pre-training.

Beyond speed, the video highlights that Qwen 3.5 is natively multimodal — trained from scratch on text and images rather than bolting a vision encoder onto an existing language model. This makes its vision benchmark performance meaningfully stronger than prior Qwen VL releases, and Witteveen places it competitive with GPT-5.2 and above Claude Opus 4.5 on several multimodal tasks. The model also scales multilingual support from 119 to over 200 languages, backed by a new 250K-token vocabulary that improves tokenization efficiency for non-Western scripts.

Witteveen also discusses the 512-expert architecture (up from 128 in Qwen 3) and speculates on what the team’s aggressive RL training environment scaling — roughly 15,000 environments — signals about the direction of future Qwen releases. For teams evaluating open-weight alternatives to proprietary frontier models, this video provides a substantive technical reference point.

📺 Source: Sam Witteveen · Published February 17, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

Sam Witteveen

Tags

Claude Opus 4.5 Gemini 3 Pro GPT 5.2 MiniMax Qwen 3.5

Prev

How a visually impaired engineer builds personal software with Claude Code + Wispr Flow

How a visually impaired engineer builds personal software with Claude Code + Wispr Flow

Next

Agent memory resolved?

Agent memory resolved?

18 Related Posts

Related Posts

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

1 day ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

1 day ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

1 day ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago

18:37

Foundation Models

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

2 days ago