NVIDIA Ships Nemotron 3.5 ASR Streaming 0.6b: Run Locally on CPU

Tutorials2 months ago

NVIDIA Ships Nemotron 3.5 ASR Streaming 0.6b: Run Locally on CPU

Descriptions:

Fahd Mirza provides a hands-on walkthrough of NVIDIA’s newly released NeMo-Tron 3.5 ASR — a 600-million-parameter streaming speech recognition model that handles 40 language locales from a single unified architecture. Running on an NVIDIA RTX A6000 with 48GB VRAM, the model consumes just 439MB during inference, making it practical to run on CPU hardware. It delivers punctuated, capitalized transcriptions in real time with configurable chunk sizes as low as 80 milliseconds, and can sustain 70x more concurrent streams than its predecessors on a single GPU.

Mirza explains the model’s cache-aware Fast Conformer RNN-T architecture in detail, walking through how language identity is injected at every frame via a 128-dimensional one-hot vector concatenated with the acoustic embedding — a design that eliminates the need for separate per-language models or a standalone language detection component. In auto-detect mode, the model identifies language on the fly; explicit language hinting provides the greatest benefit for underrepresented languages like Ukrainian and Hindi, where auto-detect showed a few additional percentage points of error.

Testing uses real human voice recordings from Google’s FLEURS dataset across multiple languages, with Mirza noting strong performance on English and German (8–9% word error rate at 320ms chunk size) and generally acceptable quality across most supported locales. He flags the biggest performance gaps on Ukrainian and Hindi as areas where specifying a language ID at inference time meaningfully improves results, and invites native speakers in the comments to verify output quality across less familiar languages.

📺 Source: Fahd Mirza · Published June 12, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

Fahd Mirza

1 Item

Companies

No Image Available

Nvidia

Tags

Fahd Mirza Nvidia

Prev

Brian Armstrong on Bitcoin, Anthropic Drops Fable 5 & Mythos 5, NewLimit’s $435M Age-Reversal | 264

Next

Why the Government Just Killed Claude Fable 5

18 Related Posts

Related Posts

22:53

Tutorials

The Viral $1 Website Effect That Looks Like $10K (Tutorial)

24 hours ago

20:17

Tutorials

Paste This Into Claude, Never Hit a Token Limit Again

24 hours ago

15:54

Tutorials

AI Video 101: How to Master AI Videos (Beginner to Advanced)

24 hours ago

08:12

Tutorials

How to Run Kimi K3 Locally (3 Ways)

24 hours ago

55:16

Tutorials

Claude Code + Codex Can FINALLY Work Together (Buzz AI)

24 hours ago

09:56

Tutorials

How to Start AI Filmmaking (Beginner Guide)

2 days ago