Build Your Own Voice AI Translation App with OpenAI’s Real-Time Translation Model

Build Your Own Voice AI Translation App with OpenAI’s Real-Time Translation Model

More

Descriptions:

Fahd Mirza walks through building a live voice translation application using OpenAI’s newly released GPT real-time translate model—a standalone interpreter model announced alongside updates to Whisper and the real-time API. Unlike general-purpose voice assistants, this model does exactly one thing: stream audio in and return translated audio plus rolling transcript deltas while the speaker is still talking. It supports over 70 input languages and currently 13 output languages, priced at $0.034 per minute of audio.

The architecture Mirza builds is a WebSocket relay: a browser connects to a FastAPI server (served via Uvicorn), which in turn opens a second WebSocket connection upstream to OpenAI’s translation endpoint. The four Python dependencies are FastAPI, Uvicorn, the WebSockets library, and python-dotenv. The full code is published to his GitHub repository. The demo shows live multilingual switching with low latency, and Mirza provides live narration—sometimes switching languages mid-sentence—to stress-test the model in real time.

Mirza is candid about the technology’s limits: fast speech, heavy accents, and overlapping words still cause degradation, and the $0.034/min rate can become expensive at production scale. He frames the model as a meaningful step forward in real-time voice AI while noting that OpenAI is still actively working on fundamental challenges like utterance boundary detection and context switching—a useful grounding perspective from a creator who has covered hundreds of local voice models on his channel.


📺 Source: Fahd Mirza · Published May 07, 2026
🏷️ Format: Hands On Build

1 Item

Channels

1 Item

Companies