Run ACE-Step 1.5-XL Locally: Generate Songs with Music in Any Language for Free

Run ACE-Step 1.5-XL Locally: Generate Songs with Music in Any Language for Free

More

Descriptions:

Fahd Mirza walks through a local installation of ACE-Step 1.5-XL Turbo, an open-source music generation model whose developers claim it matches or outperforms commercial platforms including Suno, Udio, and Morika across standard metrics: audio quality, coherence, musicality, style alignment, and lyric alignment. The model is demonstrated on an NVIDIA RTX 6000 with 48GB VRAM (consuming roughly 19GB at inference) and is noted to be compatible with GPUs as modest as an RTX 3090 with under 4GB VRAM.

The architecture is built around two specialized components: a language model that acts as a composer — parsing prompts to define BPM, key, lyrics, and full song structure — and a diffusion transformer that renders audio directly on raw waveforms rather than spectrograms, which the developers say improves clarity in vocals and bass. The system also uses Qwen 3 as an embedded language and embedding model. Served via a Gradio interface on localhost port 7860, it supports multiple modes including text-to-music generation, voice reference audio for timbre matching, and a cover mode that regenerates a song in a new style while preserving melody. It also outputs LRC synchronized lyric files suitable for karaoke.

Mirza tests the model across several challenging multilingual scenarios — Urdu Sufi Qawwali, Spanish lyrics with a reference voice recording, and a style-transfer cover — producing mixed but notable results for a locally hosted open-source tool. The video provides a practical starting point for developers and musicians wanting to run high-quality music generation entirely offline.


📺 Source: Fahd Mirza · Published April 15, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels