Descriptions:
This video from the Veteran AI channel puts two leading open-source video generation models head-to-head — Wan 2.2 and LTX Video 2.3 — using a structured technique called Prompt Relay to test whether AI can execute five distinct sequential actions within a single generated clip. The actions range from showing off an outfit and flashing a peace sign to sticking out a tongue, fixing hair, and blowing a kiss, all scripted to fire in exact order.
The core method is the Prompt Relay Timeline mode, implemented via a KJ-developed ComfyUI extension and its Prompt Relay Encode Timeline node. Rather than stuffing all instructions into one monolithic prompt — which causes models to blend or skip actions — this approach splits the prompt into timed local segments, each mapped to a specific frame range. A separate global prompt locks in the character’s appearance, clothing, and environment throughout the clip. The video walks through the full ComfyUI workflow on RunningHub, covering model loading (Wan 2.2 uses a fine-tuned A-Remix variant), sampler settings (uni_pc with simple scheduler), and resolution choices (640×960 at 16fps for Wan 2.2 vs. 720×1280 at 24fps for LTX 2.3).
The findings favor Wan 2.2 for action sequence fidelity: across close-up, medium, and wide shots, Wan 2.2 completes more steps in the correct order despite lower raw image quality. LTX 2.3 produces sharper, more stable frames but frequently drops or merges actions. Viewers leave with a reusable workflow template and a clearer picture of where each model currently excels.
📺 Source: Veteran AI · Published May 19, 2026
🏷️ Format: Comparison







