Run HiDream-O1-Image Locally with ComfyUI

Run HiDream-O1-Image Locally with ComfyUI

More

Descriptions:

Fahd Mirza walks through the complete local installation of HiDream-O1, a newly released 8-billion-parameter image generation model from HiDream, running on an Ubuntu system with an NVIDIA RTX A6000 GPU (48 GB VRAM). The tutorial covers downloading the correct ComfyUI-compatible model files from Hugging Face’s Comfy Org repository โ€” available in BF16 (16.4 GB), FP8, and MXFP8 precision variants to accommodate different VRAM budgets โ€” placing checkpoints and text encoders in the appropriate ComfyUI directories, and loading a custom workflow available on the presenter’s GitHub. At BF16 precision the model consumes approximately 16 GB of VRAM and generates images in around 30 to 40 seconds per prompt.

What distinguishes HiDream-O1 architecturally is its departure from the standard diffusion model recipe. Rather than combining a separate text encoder, VAE, and diffusion transformer, it uses a pixel-level unified transformer (UIT) that processes text, image, and conditioning signals in a single shared token space โ€” eliminating the disjoint pipeline that most current models rely on.

Mirza tests the model across five challenging prompt categories: a multi-panel comic strip with embedded text, a cinematic group portrait with specific clothing and lighting conditions, an elaborate editorial fashion photograph with intricate accessory detail, an anime character reference sheet, and a 16-bit pixel art RPG sprite sheet showing front, back, and side views. Text rendering, fine detail fidelity, and multi-view consistency all perform notably well, with hand anatomy remaining the most visible weakness.


๐Ÿ“บ Source: Fahd Mirza ยท Published May 17, 2026
๐Ÿท๏ธ Format: Tutorial Demo

1 Item

Channels

1 Item

People