Descriptions:
All About AI’s “Claude Code Let’s Build” series covers assembling an end-to-end AI video oracle: a user submits any question, Gemini Flash performs live grounded web research and compresses the answer to 50 words, Qwen3 TTS (the 1.7-billion-parameter model) synthesizes speech using a reference voice file, and OmniHuman renders a talking-avatar video from the audio and a static image โ delivering an MP4 answer in roughly five minutes per query.
A significant portion of the video focuses on running Qwen3 TTS locally on an Apple MacBook using MPS (Metal Performance Shaders) acceleration. The creator demonstrates voice cloning quality against a reference Vtuber audio file, then directly compares the output to ElevenLabs, concluding that for long-form, cost-sensitive use cases the 1.7B model is a viable alternative. The full six-step pipeline is shown live with a test question about Severance Season 3, with the OmniHuman avatar accurately lip-syncing the Gemini-researched answer.
All components โ Qwen3 TTS, Gemini API, and OmniHuman โ were integrated using Claude Code after pulling documentation from GitHub and the respective API references. The video closes with a broader discussion about AI-generated video potentially replacing traditional search results in the future, framing the pipeline as an early prototype of personalized, dynamically generated video responses.
๐บ Source: All About AI ยท Published January 23, 2026
๐ท๏ธ Format: Hands On Build







