Descriptions:
ACE Step 1.5 is a local, open-source music generation model that runs inside ComfyUI, and this Veteran AI video demonstrates its capabilities with a live, unedited benchmark: a full two-minute country music track — complete with lyrics in the style of “Country Roads Take Me Home” — generated in approximately 22 seconds. The model requires as little as 4GB of VRAM, making it accessible on older GPUs like the RTX 3060, and uses a DMD2 distillation technique to achieve its speed.
Beyond raw generation, the video covers four production tools that differentiate ACE Step from services like Suno. The LoRA fine-tuning feature allows users to create personalized audio styles locally — a Chinese New Year style LoRA is demonstrated. The Cover function can transform an existing audio file into a new version of the same song. Repainting enables selective editing of specific lyric segments within a generated track without regenerating the whole piece. The model also supports over 50 languages including Mandarin and Cantonese.
The workflow walkthrough covers both All-in-One checkpoint and split-file model formats, with text encoder options ranging from 0.6B to 4B parameters. Key generation settings include 120-second duration (recommended range: 90–120s), Model Shift of 3.0, 8 sampling steps with CFG 1.0, Euler sampler, and Simple scheduler. Prompt structure follows a five-dimension format: Genre, Instrument, Mood, Tempo, and Vocal Style.
📺 Source: Veteran AI · Published February 05, 2026
🏷️ Format: Tutorial Demo







