Netflix VOID In ComfyUI Tutorial: Pro Video Object & Interaction Removal

Netflix VOID In ComfyUI Tutorial: Pro Video Object & Interaction Removal

More

Descriptions:

The Veteran AI channel covers VOID, a video inpainting model open-sourced by Netflix on Hugging Face, and walks through the full ComfyUI workflow for running it locally. Unlike standard video inpainting tools that simply fill in the background behind a removed object, VOID—Video Object and Interaction Deletion—attempts to also erase the causal effects that object had on the scene: shadows, physical contact points, and interactions with other elements.

The model is fine-tuned from CogVideoX-Fun V1.5-5B-InP and introduces a quadmask system that divides each frame into four zones: the main removal target, the overlap region, the affected area, and the background to preserve. The ComfyUI workflow is broken into five stages: model loading (T5XXL FP16 text encoder, CogVideoX VAE, RAFT-large optical flow estimator, and SAM 3.1 for automatic mask generation), video preprocessing at 672×384 resolution with 121 frames at 24fps, mask creation, inpainting condition setup, and a two-stage sampling process. Pass 2 uses optical-flow-derived warped noise from Pass 1 rather than random noise, improving temporal stability and reducing flicker. A bypass switch allows skipping Pass 2 when Pass 1 results are sufficient.

Several test cases are analyzed in depth—including a two-person fighting scene and a skateboarding clip—with the video noting that shadow removal is a consistent failure mode when the mask does not cover the full affected region. The workflow is also demonstrated on RunningHub for users without local GPU resources.


📺 Source: Veteran AI · Published May 18, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

1 Item

Companies