Nemotron Cascade 30B-A3B: NVIDIA’s Gold Medal IMO Model Locally with MOPD

Tutorials2 months ago

Nemotron Cascade 30B-A3B: NVIDIA’s Gold Medal IMO Model Locally with MOPD

Descriptions:

NVIDIA’s Nemotron Cascade 30B-A3B is a mixture-of-experts model with 30 billion total parameters, but only 3 billion are active during any single inference pass — making it far more compute-efficient than its parameter count implies. In this hands-on walkthrough, Fahd Mirza installs and runs the model on an NVIDIA A100 80GB GPU, covering the full setup process from conda virtual environment creation and PyTorch/Transformers installation to downloading the ~50–60GB model weights from Hugging Face and loading them in a Jupyter notebook. At inference, the model consumes approximately 64GB of VRAM.

A significant portion of the video is devoted to explaining Nemotron Cascade’s multi-stage post-training pipeline, which includes supervised fine-tuning on math, code, science, and tool-use data; instruction-following RL; multi-domain RL across STEM reasoning and structured outputs; and a novel step called Multi-domain On-Policy Distillation (MOPD). In MOPD, the model trains against its own strongest domain-specific checkpoints from earlier in the pipeline — the best math checkpoint teaches math, the best alignment checkpoint teaches alignment — recovering skills that tend to degrade during RL stages. The pipeline concludes with RLHF, long-context RL, competitive programming RL on 3,500 hard problems, and software engineering agent RL.

The result is a model that achieved gold medal performance on the International Mathematical Olympiad (IMO) and International Collegiate Programming Contest (ICPC) benchmarks, outperforming models two to four times its size on coding and math, despite being roughly 20 times smaller than the only other model previously reaching that level. Mirza tests it on a complex creative coding prompt and notes strong reasoning performance alongside weaker general knowledge — a known limitation of the base model rather than the post-training pipeline.

📺 Source: Fahd Mirza · Published March 20, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

Fahd Mirza

1 Item

Companies

No Image Available

Nvidia

Tags

Nvidia

Prev

Qianfan-OCR: End-to-End OCR That Does Layout-as-Thought: Run Locally

Qianfan-OCR: End-to-End OCR That Does Layout-as-Thought: Run Locally

Next

Why Seedance 2.0 is Delayed and will be NERFED – Comparison vs Sora 2 & Kling 3.0

Why Seedance 2.0 is Delayed and will be NERFED – Comparison vs Sora 2 & Kling 3.0

18 Related Posts

Related Posts

14:22

Tutorials

Codex Mobile Released and It’s Insane

1 hour ago

14:38

Tutorials

Using HiDream-O1 Natively in ComfyUI

1 hour ago

10:54

Tutorials

Talkie: I Ran a 1930 AI Model Locally and Talked to People from the Past

1 day ago

03:02

Tutorials

Installing Claude Code

1 day ago

08:17

Tutorials

OpenAI Codex Now Works from Anywhere (Dispatch Killer?)

1 day ago

08:41

Tutorials

Luce DFlash Meets OpenClaw – Local AI Agents at 2x Speed with Qwen3.6-27B

2 days ago