Infinite AI Avatars from Audio! 🤯 Long Cat Video Avatar Full Guide|Auto-Loop Extension

Infinite AI Avatars from Audio! 🤯 Long Cat Video Avatar Full Guide|Auto-Loop Extension

More

Descriptions:

Long Cat Video Avatar is an audio-driven AI avatar model accessible through Kijai’s Wan Video extension for ComfyUI, capable of generating realistic lip-synced talking head videos from a reference image and an audio clip. This guide from Veteran AI presents three progressively more capable workflows and addresses the mixed reception the model received in early community reviews—demonstrating that natural character motion, including gestures and expressions, is achievable with the right parameter tuning.

The technical walkthrough is thorough. The model uses a sliding window architecture where each generation window spans 93 frames (rather than the standard 81) to create a 13-frame overlap for seamless stitching during extension. The tutorial covers vocal separation from background music using a track separation node, the specialized Long Cat scheduler with a shift value of 12, and key parameter decisions: 480×832 resolution, CFG of 1.0, and 8 sampling steps (reduced from Kijai’s default 12 for speed without meaningful quality loss). An important audio stride quirk is explained—setting FPS to 32 in the node outputs video at 16 FPS due to an Audio Stride of 2. Both BF16 and FP8 model variants are covered for different VRAM budgets.

The second workflow automates the entire loop extension process, handling frame overlap calculations and stitching without manual intervention. The third removes automatic camera zoom to keep full-body framing stable across long generations. The complete workflows are hosted on RunningHub, and the guide is especially valuable for creators who want to move past the basic Kijai template toward production-ready infinite-loop avatar video generation.


📺 Source: Veteran AI · Published December 29, 2025
🏷️ Format: Tutorial Demo

1 Item

Channels