The NEW Best ASR – NVIDIA Nemotron 3.5 ASR

Tutorials2 months ago

The NEW Best ASR – NVIDIA Nemotron 3.5 ASR

Descriptions:

NVIDIA’s Nemotron 3.5 ASR is a 600-million-parameter streaming speech recognition model that transcribes 40 languages from a single checkpoint and can be fully self-hosted. In this detailed walkthrough, AI developer Sam Witteveen explains what makes Nemotron 3.5 technically distinct from existing solutions like Whisper: a mechanism called cache-aware streaming. Rather than re-encoding overlapping audio chunks on every pass, the model caches encoder self-attention states and reuses them as new audio arrives—conceptually similar to KV-caching in large language model decoding. NVIDIA reports up to 17x efficiency gains on H100 hardware; Witteveen corroborates noticeably faster throughput running the model on a DGX system.

The video walks through configurable inference chunk sizes—80ms, 160ms, 320ms, 560ms, or ~1 second—allowing developers to trade latency for transcription granularity depending on their use case. Witteveen also demonstrates word boosting, a decode-time technique that steers the model toward user-supplied vocabulary (product names, proper nouns) using a scoring tree, with no retraining required. A third feature, diarization, enables speaker-level attribution for multi-speaker audio.

Witteveen runs the full demo over a local network from an NVIDIA DGX to a Mac, showing real-time transcription at different latency settings. He notes that community members have already released quantized and MLX versions of the model. For developers currently running Whisper or similar batch-oriented ASR pipelines, this video serves as a practical evaluation guide for migrating to a production-ready, low-latency streaming alternative.

📺 Source: Sam Witteveen · Published June 07, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

Sam Witteveen

1 Item

Companies

No Image Available

Nvidia

Tags

H100 MLX Nvidia Sam Witteveen Whisper

Prev

Anthropic Files $965B IPO, Trump Signs AI Executive Order, and ChatGPT Crosses 1B Users | EP #262

Next

Master Ideogram 4 Layouts: Pro Poster Design with Visual Prompt Builder

18 Related Posts

Related Posts

08:04

Tutorials

Herdr: Run Multiple AI Coding Agents in Parallel from Your Terminal

1 hour ago

15:54

Tutorials

Buzz Huddle Test: 4 Humans, 2 AI Agents

1 hour ago

15:54

Tutorials

AI Video 101: How to Master AI Videos (Beginner to Advanced)

1 day ago

08:12

Tutorials

How to Run Kimi K3 Locally (3 Ways)

1 day ago

55:16

Tutorials

Claude Code + Codex Can FINALLY Work Together (Buzz AI)

1 day ago

22:53

Tutorials

The Viral $1 Website Effect That Looks Like $10K (Tutorial)

1 day ago