Descriptions:
Heart Mula is a new open-source music foundation model family released in January 2026 that can generate high-fidelity songs up to six minutes long — completely offline and free — putting it in direct competition with paid services like Suno. This Veteran AI tutorial introduces the model’s four-component architecture: an ultra-low-frequency audio tokenizer (12.5 Hz discrete sequences), the core Heart Mula generation model, a lyrics recognition module, and Heartclap (an audio-text alignment model for labeling). Two sizes are available: a 7B model reportedly matching Suno’s generation quality, and a 3B model used throughout this demonstration.
The ComfyUI extension for Heart Mula runs on just 12GB of VRAM, consolidating model loading and processing into a single node. The tutorial covers key workflow guidelines: structuring lyrics with explicit sections (Intro, Verse, Chorus, Outro), using short line breaks to ensure natural musical pacing, and keeping style descriptions as simple keyword tags rather than long prose. Generation is demonstrated in both Chinese and English, with the creator noting the 3B model’s output quality is surprisingly strong for its parameter count. The model also supports Japanese, Korean, and Spanish.
The tutorial extends beyond audio generation to show a full audio-visual pipeline: the generated music track is fed into the Wan S2V (Sound-to-Video) model to produce a synchronized video, demonstrating how Heart Mula fits into broader ComfyUI multimedia workflows. All workflows are hosted on RunningHub for cloud-based testing.
📺 Source: Veteran AI · Published January 23, 2026
🏷️ Format: Tutorial Demo







