From Transcription to Live Music: Gemini’s Audio Stack — Thor Schaeff, Google DeepMind

Tutorials2 months ago

From Transcription to Live Music: Gemini’s Audio Stack — Thor Schaeff, Google DeepMind

Descriptions:

Thor Schaeff, developer experience lead on the Gemini API and Google AI Studio at Google DeepMind, walks through the current state of Gemini’s audio stack in a conference session at AI Engineer. Starting from audio understanding in Gemini 3 — which goes beyond transcription to extract speaker identity, emotional tone, language, and timestamps within a single API call — he traces the progression through Gemma 4’s on-device audio support (available on edge devices) and the recently launched Gemini 3.1 Flash Live, a full-duplex real-time conversational model that simultaneously handles voice, text, and vision input.

The talk includes a live demo of Echo Script, a Gemini 3 Flash Preview application in the Google AI Studio gallery that demonstrates rich audio extraction in one request: it labels speakers by name, identifies languages, flags emotional register, and generates English translations from multilingual audio. A second demo covers Gemini’s speech generation philosophy — rather than selecting from a large library of static voices, developers direct roughly 30 base voices using “director’s notes,” a scene-setting and performance instruction approach that allows precise accent, tone, and delivery shaping without hard-coded alternatives.

Schaeff also references Video 3.1 Light on the generative media side and explains that all dedicated audio models are now built on top of Gemini 3’s foundational research. The session is aimed at developers building voice and multimodal applications on the Gemini API and offers a practical map of what is available today in Google AI Studio alongside the capabilities driving the underlying models.

📺 Source: AI Engineer · Published June 09, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

AI Engineer

1 Item

Companies

No Image Available

DeepMind

Tags

DeepMind Gemini 3 Pro Gemini 3.1 Flash Live Gemini API Gemma 4 Google AI Studio

Prev

Developers Hope for Big Leaps From Apple’s AI

Next

Dan Dreyfus: The Next AI Bottleneck is Copper

18 Related Posts

Related Posts

22:53

Tutorials

The Viral $1 Website Effect That Looks Like $10K (Tutorial)

24 hours ago

20:17

Tutorials

Paste This Into Claude, Never Hit a Token Limit Again

24 hours ago

15:54

Tutorials

AI Video 101: How to Master AI Videos (Beginner to Advanced)

24 hours ago

08:12

Tutorials

How to Run Kimi K3 Locally (3 Ways)

24 hours ago

55:16

Tutorials

Claude Code + Codex Can FINALLY Work Together (Buzz AI)

24 hours ago

20:44

Tutorials

How to task AI with large projects

2 days ago