Descriptions:
Nerdy Rodent demonstrates a complete ComfyUI workflow for generating high-quality audio using Stable Audio 3, with a focus on accessibility for users with limited GPU memory. The smallest available Stable Audio 3 model weighs just 2GB, and the tutorial walks through configurations for three model tiers — small, medium, and base — comparing output quality, required steps, and CFG values for each.
The workflow follows the presenter’s “Rodent Method” of organizing ComfyUI nodes into color-coded modular groups to reduce complexity and ease future updates. Specific configurations covered include running the medium model at 8 steps with CFG 1 for near-instant generation, testing an optional model shift node with a value of 1 to hear its effect on output character, and using the base model for fine-tuning experiments since it is undistilled. A standout segment integrates Gemma 4’s audio understanding capabilities to build an audio-to-text-to-audio pipeline: existing audio is fed to Gemma 4, which generates a descriptive style prompt, and that prompt then drives fresh generation in Stable Audio 3.
Additional experiments include using Ace Step-generated beats as input audio via VAE encoding, switching between the linear quadratic scheduler and standard settings, and adjusting denoise strength to control how closely outputs track the reference material. The tutorial is hands-on and reproducible, suited to anyone running local AI audio generation without high-end hardware. Patreon supporters can access pre-built versions of the workflows.
📺 Source: Nerdy Rodent · Published May 21, 2026
🏷️ Format: Tutorial Demo







