KittenTTS – The Nano TTS

KittenTTS – The Nano TTS

More

Descriptions:

Sam Witteveen reviews KittenTTS, a family of ultra-compact text-to-speech models from KittenML that challenge the assumption that usable voice synthesis requires hundreds of millions of parameters. The lineup spans three tiers: Mini at 80 million parameters (~80MB on disk), Micro at 40 million, and Nano at 15 million — with an 8-bit quantized Nano variant coming in at just 25 megabytes. All models are CPU-optimized, require no GPU, and ship under an Apache 2.0 license permitting commercial use.

Witteveen runs audio comparisons across all four variants using a Google Colab notebook with no GPU attached, playing the same test sentences through each model size. The degradation curve from 80M to the 8-bit 15M model is noticeable but more gradual than the parameter reduction might suggest — the quantized Nano produces intelligible, recognizable speech with some artifacts and a tendency to run sentences together without proper pausing at punctuation. He contrasts this against Qwen TTS at 1.7 billion parameters, which delivers far superior naturalness but is too large for browser or on-device deployment. KittenTTS’s size opens deployment scenarios completely off-limits to larger models: fully in-browser synthesis via WebAssembly, lightweight mobile apps, and constrained edge hardware.

The project appears to be primarily a solo effort, currently at version 0.8 after earlier 0.1–0.2 releases in mid-2025. Witteveen draws a comparison to Kokoro TTS as a stylistic reference point for voice character, and suggests the rapidly improving quality trajectory makes KittenTTS worth watching as a foundation for specialized fine-tunes targeting specific voice profiles or deployment environments.


📺 Source: Sam Witteveen · Published February 22, 2026
🏷️ Format: Review

1 Item

Channels