Is This The Fastest ASR?

Foundation Models1 week ago

Is This The Fastest ASR?

Descriptions:

Sam Witteveen breaks down IBM’s Granite Speech 4.1 release — a suite of three ~2-billion-parameter ASR models designed for edge deployment that collectively challenge established leaders like OpenAI’s Whisper, Whisper X, and NVIDIA’s Parakeet. The video opens with context on IBM’s broader Granite model family, which spans language, vision, speech, and embedding models, before zeroing in on what makes this speech release distinctly different: three specialized variants rather than a single general-purpose model.

The base model, Granite Speech 4.1 2B, currently sits atop the HuggingFace Open ASR Leaderboard with a word error rate of 5.33 and a real-time factor of roughly 231 — meaning an hour of audio transcribed in about 16 seconds. It supports seven languages plus bidirectional translation and includes keyword biasing, letting users inject domain-specific terms or names directly into the prompt. The Plus variant adds speaker diarization and word-level timestamps, beating customized Whisper builds on timestamp accuracy — making it the go-to for podcast and meeting transcription use cases.

The most technically ambitious of the three is the non-autoregressive model (the “N” variant), which uses IBM’s proprietary NLE (Non-autoregressive LLM-based Editing) technique. Rather than generating tokens sequentially, it edits a fast CTC draft transcript using bidirectional attention — achieving a real-time factor of 1,820 on an H100 GPU, or roughly one hour of audio in two seconds, with only a modest accuracy tradeoff. Witteveen walks through the architectural reasoning clearly, making this a strong resource for developers evaluating ASR infrastructure.

📺 Source: Sam Witteveen · Published May 07, 2026
🏷️ Format: Deep Dive

1 Item

Channels

No Image Available

Sam Witteveen

1 Item

Companies

No Image Available

IBM

Tags

IBM Microsoft Whisper

Prev

Who Cares About Consumer AI

Next

World Banks JUST got scared…

18 Related Posts

Related Posts

31:55

Foundation Models

The biggest AI breakthrough in medicine & drug discovery

23 hours ago

01:20:07

Foundation Models

Mind the Gap (In your Agent Observability) — Amy Boyd & Nitya Narasimhan, Microsoft

23 hours ago

25:53

Foundation Models

The Trillion Dollar Agentic Workflow Opportunity Is Here

23 hours ago

20:09

Foundation Models

Pinecone Just Demoted Vector Search. Here’s the Knowledge Layer.

2 days ago

14:27

Foundation Models

Claude Makes Dashboards Too Easy. That’s the Problem.

2 days ago

18:37

Foundation Models

CI/CD Is Dead, Agents Need Continuous Compute and Computers — Hugo Santos and Madison Faulkner

2 days ago