Descriptions:
The second LTX 2.3 deep-dive from the Veteran AI channel tackles custom audio synchronization—specifically how to drive real audio clips, including multi-character dialogue, through AI-generated video with accurate lip sync. The video opens with a genuine troubleshooting narrative: after successfully testing single- and multi-character audio, the creator hit a case where lip movement completely stopped, ultimately tracing the failure to output resolution. Testing confirmed that 9:16 (720×1280) and 16:9 produce reliable results, while other aspect ratios like 3:4 (960×1280) cause the model to drop lip animation entirely.
The custom audio workflow is explained step by step: upload an audio clip, trim it to match the video duration based on frame count (121 frames ÷ 24 FPS), run vocal isolation using Kijai’s separation nodes for music tracks, encode audio through the LTX audio encoder to generate an audio latent, apply a black mask that locks the audio in place to prevent model modification, then merge the audio latent with the video latent before sampling.
Multi-character audio extends this by assigning separate masked audio regions to distinct characters within a single frame, with 16:9 strongly recommended for stability. Despite the model’s official documentation emphasizing vertical (9:16) support, the creator concludes through repeated testing that 16:9 remains the most reliable baseline. All workflows are deployable on the RunningHub cloud platform, which consistently supports the latest Kijai node updates for LTX 2.3.
📺 Source: Veteran AI · Published March 11, 2026
🏷️ Format: Tutorial Demo







