DramaBox – Run Most Expressive TTS with Voice Cloning Locally

Tutorials2 months ago

DramaBox – Run Most Expressive TTS with Voice Cloning Locally

Descriptions:

Fahd Mirza takes a hands-on look at DramaBox, a newly released expressive text-to-speech model that can be run locally on consumer-grade hardware. DramaBox is a fine-tune of Lyra X LTX 2, built on a 3.3 billion parameter audio-only diffusion transformer using flow matching, conditioned on Gemma 3’s 12 billion parameter text embeddings. The architecture pairs a diffusion transformer backbone with an audio variational autoencoder and a vocoder, enabling nuanced control over delivery including pauses, emotional shifts, and mid-sentence tonal changes.

What sets DramaBox apart from standard TTS systems is its treatment of prompts as performance scripts: dialogue goes inside double quotes and is spoken literally, while everything outside functions as a stage direction — instructions like ‘his voice fills with genuine indignation’ or ‘she pauses, exhausted’ shape delivery without being spoken aloud. Mirza demonstrates this across multiple test cases including a male overconfidence monologue, a female voice clone reacting to durian fruit, and a Freudian-style female character.

The video covers installation on Ubuntu with an NVIDIA RTX 6000 (48GB VRAM), with the model consuming just over 16GB of VRAM at runtime and the full weights coming in at roughly 26GB. Mirza’s honest assessment: expressive range is noticeably improved over older TTS models, but voice cloning fidelity still falls short of the best alternatives, and output retains a slightly synthetic quality under close listening.

📺 Source: Fahd Mirza · Published May 13, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

Fahd Mirza

Tags

Fahd Mirza

Prev

A-Star: Small Bets Still Crucial for VC-Style Returns

Next

Talkie: I Ran a 1930 AI Model Locally and Talked to People from the Past

18 Related Posts

Related Posts

11:53

Tutorials

You’re Not Behind (Yet): Master Hermes In 12 Minutes

22 hours ago

08:18

Tutorials

Claude Code Artifacts Are Here (No Backend!)

22 hours ago

09:02

Tutorials

Needle: Finetune a 26M Tool-Calling Model Locally with Ollama

22 hours ago

14:35

Tutorials

Fable 5 + Karpathy’s LLM Wiki is Basically Cheating

22 hours ago

10:25

Tutorials

Krea2 Has No Good Reference Mode. LoRA Is the Fix|From Dataset to Turbo Output

22 hours ago

08:27

Tutorials

you only have 6 days….

2 days ago