Voice In, Visuals Out: The Agony and the Ecstasy – Allen Pike, Forestwalk Labs

Coding & Dev Tools6 days ago

Voice In, Visuals Out: The Agony and the Ecstasy – Allen Pike, Forestwalk Labs

Descriptions:

Allen Pike of Forestwalk Labs delivers a practical engineering talk on building what Andrej Karpathy has called “voice in, visuals out” experiences — AI interfaces where users speak naturally and receive visual responses rather than text. Pike argues this design pattern resolves a fundamental mismatch: voice carries far higher bandwidth than typing (more words per minute, plus tone and emphasis), while visual output is far more information-dense than synthesized speech.

The core technical challenge Pike addresses is latency. Full voice-to-voice conversation requires sub-200ms round-trip response to feel natural, an essentially impossible bar given current network, speech-to-text, inference, and text-to-speech pipeline costs. Visual output, however, has a much more forgiving envelope — responses appearing within one second still feel responsive. Forestwalk’s in-call voice agent, which files Linear issues and takes action on incidental speech during meetings, exploits this asymmetry. Pike shares a counterintuitive finding from production: GPT-4o mini, despite being a small model, showed P95 latencies of 5,000–10,000ms on standard OpenAI endpoints, making inference platform selection as important as model selection when optimizing for latency.

Three practical lessons close the talk: use inference providers that optimize for latency over throughput, target the forgiving visual response window instead of chasing voice-to-voice, and stream early — begin rendering a visual response before the full answer is generated to stay within the user’s attention span. Pike also references Thinking Machines and Neolab’s 200ms time-sliced continuous inference architecture as a promising voice-to-voice approach for teams that need it.

📺 Source: AI Engineer · Published June 28, 2026
🏷️ Format: Hands On Build

1 Item

Channels

No Image Available

AI Engineer

Tags

Andrej Karpathy Linear Siri Slack Thinking Machines Labs

Prev

HERMES AGENT + Stripe Payments + NVIDIA Nemotron is INSANE!

Next

Run DeepSeek DSpark on Qwen3 Locally and Reproduce the Speedup

18 Related Posts

Related Posts

09:39

Coding & Dev Tools

DeepSeek DFlash on Gemma 12B Locally: Up To 5x Faster

23 hours ago

15:45

Coding & Dev Tools

Every AI Agent Demo Stops at Email. I Pointed Mine at the Bills That Cost You Money.

23 hours ago

24:28

Coding & Dev Tools

Fable 5 is WILD…

2 days ago

08:08

Coding & Dev Tools

I Embedded Whisper.cpp Into a Real App

2 days ago

21:09

Coding & Dev Tools

I Built a Real AI Jarvis That Controls My Computer

3 days ago

13:29

Coding & Dev Tools

Control What Your AI Agents Can Do: Archestra + Ollama Hands-On

4 days ago