NVIDIA New AI Is An Efficiency Monster

Research & Benchmarks2 days ago

NVIDIA New AI Is An Efficiency Monster

Descriptions:

Two Minute Papers host Dr. Károly Zsolnai-Fehér breaks down NVIDIA’s newly released 30-billion-parameter open multimodal model, which handles images, video, and audio natively while reporting throughput figures that stand out from comparable systems. According to the paper, the model processes nearly 10 hours of video per hour—roughly 10x real-time—runs approximately three times faster than Qwen3 Omni on video tasks, and achieves up to seven times faster processing on documents. Local deployment requires around 25GB of VRAM, targeting high-end desktop GPUs or cloud instances such as Lambda.

Five architectural choices are credited for the efficiency gains. Memory layers scale linearly with context length rather than quadratically, giving the model a compounding advantage on long video or multi-document inputs. An audio tokenizer converts raw waveforms into tokens while preserving emotional tone and prosody, eliminating the need for a separate heavyweight speech recognition model like Whisper. Three-dimensional convolutions process blocks of video frames simultaneously rather than frame-by-frame, compressing temporal redundancy before it reaches downstream layers. Three separate CLIP-style models—for image-text matching, fine-grained detail, and object segmentation—are distilled into a single compact encoder. Finally, an efficient video sampling step discards duplicate frames prior to final processing.

On licensing, the model ships under a custom NVIDIA license that permits commercial use and derivative works with attribution requirements—more permissive than expected, though short of Apache 2.0. The model’s main limitation is pure-text reasoning and coding, where other open-weight options remain stronger choices.

📺 Source: Two Minute Papers · Published May 13, 2026
🏷️ Format: Review

1 Item

Channels

No Image Available

Two Minute Papers

1 Item

Companies

No Image Available

Nvidia

Tags

DeepSeek Gemma 4 Nvidia Whisper

Prev

Google’s New Gemini Omni Just Shocked Everyone – Leaked Demo, Pricing, and what comes next

Next

Talkie: I Ran a 1930 AI Model Locally and Talked to People from the Past

18 Related Posts

Related Posts

42:12

Research & Benchmarks

What AI Agent Should YOU be Using?

22 hours ago

10:46

Research & Benchmarks

Ring-2.6-1T: The 1 Trillion Parameter Open Source Model That NO ONE Can Run

22 hours ago

09:34

Research & Benchmarks

I Tried GPT Image 2.0 for 14 Days So You Don’t Have To

3 days ago

30:30

Research & Benchmarks

Which AI Image Generator Should You Actually Use?

5 days ago

24:34

Research & Benchmarks

Codex vs Cowork for Regular People (Every Feature Compared)

7 days ago

11:02

Research & Benchmarks

New ChatGPT Model & Memory Features Explained (AI News You Can Use)

7 days ago