Marco-Nano & Marco-Mini: Alibaba’s Insane Sparse MoE Models: Run Locally

Coding & Dev Tools1 month ago

Marco-Nano & Marco-Mini: Alibaba’s Insane Sparse MoE Models: Run Locally

Descriptions:

Fahd Mirza installs and tests two new sparse mixture-of-experts models from Alibaba’s AIDC AI division — Marco-Nano Instruct and Marco-Mini Instruct — running them locally on an NVIDIA RTX 6000 with 48GB VRAM. Both models are built on the same decoder-only transformer architecture with MoE layers upcycled from a Qwen 3.6B base, but their efficiency profiles are strikingly different: Nano has 8 billion total parameters but activates only 0.6 billion per token (a 7.5% activation ratio), while Mini uses 17.3B total parameters with 0.86B activated per token.

The most interesting finding comes from direct side-by-side comparison. On a structured JSON output task (listing five planets with distances), Nano responded almost instantly with clean, valid output — while Mini took significantly longer, hallucinated a single-planet input, self-corrected, then returned six planets anyway. On multilingual translation across 25 languages, the roles partially reversed: Mini produced richer, more culturally adapted phrasing, while Nano was faster and more consistent but occasionally more literal. A SQL bug-finding task rounds out the evaluation.

The video makes a useful practical point: in sparse MoE architectures, more total or even more activated parameters does not automatically mean better instruction-following or faster inference. For developers evaluating efficient open-weight models for multilingual or structured-output workloads, Marco-Nano’s extreme activation sparsity offers a compelling tradeoff worth testing.

📺 Source: Fahd Mirza · Published April 09, 2026
🏷️ Format: Hands On Build

1 Item

Channels

No Image Available

Fahd Mirza

1 Item

Companies

No Image Available

Alibaba

Tags

Alibaba Qwen

Prev

Tech Stocks Rally on the Back of US-Iran Ceasefire Deal | Bloomberg Tech 4/8/2026

Tech Stocks Rally on the Back of US-Iran Ceasefire Deal | Bloomberg Tech 4/8/2026

Next

“Mythos is the BIGGEST RISK to financial markets” THE FED

“Mythos is the BIGGEST RISK to financial markets” THE FED

18 Related Posts

Related Posts

10:06

Coding & Dev Tools

Toto 2.0: Datadog’s Observability AI Model – Full Install + Live Dashboard

7 minutes ago

01:04:27

Coding & Dev Tools

Make your own event-sourced agent harness using stream processors — Jonas Templestein, Iterate

1 day ago

15:13

Coding & Dev Tools

Make the PERFECT Videos with Claude Code (Full Workflow)

1 day ago

24:11

Coding & Dev Tools

Building a Polymarket AI Trading Bot From Scratch

3 days ago

20:42

Coding & Dev Tools

A Piece of Pi: Embedding The OpenClaw Coding Agent In Your Product — Matthias Luebken, Tavon

4 days ago

08:28

Coding & Dev Tools

Qwen3-8B at 74 tok/s with RedHat DFlash Speculator on vLLM Locally

4 days ago