Voicebox: Free ElevenLabs Alternative – Runs Locally on Windows CPU

Voicebox: Free ElevenLabs Alternative – Runs Locally on Windows CPU

More

Descriptions:

Fahd Mirza installs and stress-tests Voicebox, a free open-source voice cloning and text-to-speech application, on a CPU-only Windows machine — documenting real performance characteristics and bugs rather than ideal-case results.

Voicebox positions itself as a self-hosted alternative to ElevenLabs, running entirely on-device with no audio data sent to external APIs. It uses Coqui TTS as its backend inference engine (with 1.7B and 6B model options), Whisper for in-app transcription, and supports a timeline editor for multi-voice dialogue projects and a REST API for external integrations. The desktop app is built with Tauri rather than Electron, keeping the install footprint small. On CPU with no GPU acceleration, Mirza records a roughly 10-minute initial load time and approximately 1-minute generation time for a short sentence — both slower than GPU-backed alternatives but functional.

The review surfaces two concrete bugs worth noting: importing an existing voice requires a specific manifest.json structure that is undocumented on the GitHub repo, and selecting the 1.7B model still triggers a download of the full 6B model, suggesting a bug in model-selection logic. Despite these rough edges, voice quality in the final output is assessed as strong, particularly in English. The video is valuable for developers evaluating local TTS options or building privacy-sensitive speech applications without a subscription to cloud voice services.


📺 Source: Fahd Mirza · Published February 24, 2026
🏷️ Format: Review

1 Item

Channels