Clone ANY Voice for Free — Qwen Just Changed Everything

Clone ANY Voice for Free — Qwen Just Changed Everything

More

Descriptions:

Sam Witteveen digs into the newly open-sourced Qwen 3 TTS model family from Alibaba’s Qwen team, covering what the release means for developers who have historically had to choose between closed APIs (OpenAI, Google Gemini) or open-weight models with voice cloning stripped out. Qwen 3 TTS changes that equation by releasing both voice design and voice cloning under an open license, including the base models needed for fine-tuning.

The video walks through both model sizes in detail: the 0.6B model supports 10 languages and a curated set of speaker voices including multiple Chinese dialects, while the 1.7B model adds instruction-based voice design (describing pitch, tone, and style in natural language) and zero-shot voice cloning from a short audio clip. Witteveen runs live demos using Google Colab notebooks and Hugging Face Spaces, testing multilingual generation, cross-language voice transfer, batch inference, and speaker consistency — noting specific artifacts in Spanish output and speculating on training data coverage as the cause.

The video also explains why the release of base models alongside fine-tuned checkpoints is particularly significant: it opens a path for the community to train Qwen 3 TTS on new languages and dialects beyond the official 10, something that has historically been a major gap in open-source TTS. Developers interested in production voice pipelines, multilingual applications, or custom voice generation will find this a practical starting point.


📺 Source: Sam Witteveen · Published January 23, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels