LTX2.3 | How to Extend AI Video & Audio Flawlessly,3 Steps to Extend & Upscale LTX 2.3 Videos to 2K!

LTX2.3 | How to Extend AI Video & Audio Flawlessly,3 Steps to Extend & Upscale LTX 2.3 Videos to 2K!

More

Descriptions:

Veteran AI demonstrates a three-stage ComfyUI workflow for extending AI-generated videos using LTX 2.3, with a key innovation being that both video and audio are extended simultaneously while preserving the original style, character consistency, and rhythmic sync. The workflow runs on RunningHub, an online ComfyUI cloud platform, and targets 2K (960×1280) output through multi-stage upscaling.

The core technical component is the LTXV Audio Video Mask node, which operates in latent space rather than on raw pixels or audio waveforms. The tutorial explains a non-obvious property of LTX 2.3’s VAE encoder: it compresses not just resolution but also frame count, at approximately an 8x ratio. So 60 pixel-space frames become roughly 8 latent frames, which dramatically reduces the computational complexity of the extension operation. The mask node uses this latent representation to define which frames are fixed reference material and which frames the model must generate, ensuring clean boundary handling between original and generated content.

The full workflow pipeline covers scaling the source video to half the target resolution before processing (to align with the subsequent 2x upscale step), extracting the final N frames as reference material using the TRIM Audio node (with frame-to-time conversion for audio alignment), encoding both streams through the VAE, running the LTXV sampler, and applying two upscaling passes to reach the final resolution. The technique is demonstrated on a rhythmically challenging source clip to show that style and audio consistency hold even under difficult conditions.


📺 Source: Veteran AI · Published March 23, 2026
🏷️ Format: Tutorial Demo