Descriptions:
Days after Google released Gemini 3 Flash, the All About AI channel built a voice-controlled live web editor that lets a developer speak instructions and watch the page update in real time — no typing required. The system uses Whisper for speech-to-text transcription, routes the transcript to Gemini 3 Flash’s API for code generation, and patches a local webpage’s DOM on the fly. A push-to-talk key keeps latency minimal and makes the loop feel nearly conversational.
What distinguishes the demo from simple voice-to-code experiments is the integrated function-calling layer: when a spoken prompt requires a new npm package, the system installs it automatically via a registered tool. A second tool connects to the Said Image model, so asking for “a fitting image” mid-session triggers an API call, generates an image, and embeds it — all without leaving the voice interface. The demo covers a physics-based bouncing ball, a word-image matching mini-game, and a Minecraft-style 3D environment with walkable terrain and trees.
The video makes a clear case for Gemini 3 Flash’s combination of speed and function-calling reliability as the enabling factor — slower models would break the real-time feel. For developers thinking about voice-driven interfaces or live AI-assisted coding environments, the architecture shown here is a concise and reproducible starting point.
📺 Source: All About AI · Published December 19, 2025
🏷️ Format: Hands On Build







