Descriptions:
Dan Kieft walks through a complete end-to-end pipeline for creating photorealistic AI avatars that look and sound like you, covering image generation, short-form video, cinematic B-roll, and long-form YouTube content across multiple camera angles.
The workflow is built primarily around Kling (also referred to as Hickfield), with HeyGen handling long-form content and ElevenLabs providing high-fidelity professional voice cloning. For image generation, Kieft explains two approaches: a quick character sheet method using a single reference photo processed through Nano Banana, and a more powerful Soul ID method that trains on 20+ images for consistent identity preservation across poses, lighting, and backgrounds. Using Pixel Soul with a trained Soul ID, the system reproduces fine details like facial hair, moles, and acne scarring across generated stills.
For video production, Kieft demonstrates Cling’s start/end frame feature to animate between poses for short-form content, B-roll generation for ad-style videos, and lip-sync workflows using both Kling’s built-in voice cloning and ElevenLabs’ professional 30-minute voice model for higher fidelity output. The full stack—Nano Banana for character sheets, Kling for generation, Cling for animation, HeyGen for long-form, and ElevenLabs for voice—represents a practical production pipeline for creators wanting to scale video output without appearing on camera for every shoot.
📺 Source: Dan Kieft · Published March 27, 2026
🏷️ Format: Tutorial Demo







