Cosmos 3 – NVIDIA’s World Foundation Model

Foundation Models2 months ago

Cosmos 3 – NVIDIA’s World Foundation Model

Descriptions:

Sam Witteveen covers the launch of NVIDIA Cosmos 3, announced at the GTC Taipei conference, describing it as a significant step forward in world foundation models for physical AI. Unlike its predecessors, which separated prediction, transfer, and generation into distinct models, Cosmos 3 unifies five modalities — text, images, video, audio, and robotic actions — into a single architecture capable of both understanding and generating across all five.

The architecture uses what NVIDIA calls a mixture of transformers with a dual-tower design: an autoregressive ‘reasoner’ tower handles input understanding, while a diffusion-based ‘generator’ tower handles output synthesis. The two towers share multimodal attention, allowing the model to go from text or image input to video or action output in a single unified pass. Cosmos 3 Super uses a 32B parameter model per tower; Cosmos 3 Nano uses 8B per tower. NVIDIA also references an unreleased edge variant intended for real-time, on-device inference. The model makes strong use of existing components including Qwen3 VL (8B and 32B) and reuses VAEs from Cosmos 1.2.2.

Witteveen runs inference on Cosmos 3 Nano using a DGX Spark, demonstrating video generation for robotic arm training data — synthetic sequences showing a cabinet of fruit being manipulated. He highlights NVIDIA’s unusually transparent technical report, which breaks down both pre-training and supervised fine-tuning data sources in detail, and argues this makes Cosmos 3 a practical starting point for teams wanting to fine-tune on their own physical AI datasets.

📺 Source: Sam Witteveen · Published June 01, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

Sam Witteveen

1 Item

Companies

No Image Available

Nvidia

Tags

Cosmos 3 DGX Spark Nvidia Sam Witteveen

Prev

Microsoft Says 86% Treat AI Output as a Starting Point. Your Resume Just Stopped Working.

Next

The BEST AI for 4K images. Free & fast

18 Related Posts

Related Posts

21:09

Foundation Models

Persona Engineering: A Field Guide to AI Synthetic Personas — Ishan Anand, InsightSciences.ai

1 day ago

21:39

Foundation Models

Serving 2 Million Models Without Melting: Scaling the Hugging Face Hub — Arek Borucki, Hugging Face

2 days ago

06:40

Foundation Models

AMD Releases First Ever AI model: Instella-MoE-16B-A3B-Think

2 days ago

24:01

Foundation Models

US AI Dominance Is Over: Here’s Why

3 days ago

17:31

Foundation Models

The Messy Reality of Scale: Synthetic Data and Pre-Training — Marah Abdin & Robert McHardy, poolside

4 days ago

20:24

Foundation Models

From Agent Traces to Agent Simulations — Rustem Feyzkhanov, Snorkel AI

5 days ago