Microsoft Lens in ComfyUI – Small but Powerful!

Microsoft Lens in ComfyUI – Small but Powerful!

More

Descriptions:

The Nerdy Rodent channel walks through a detailed ComfyUI workflow for Microsoft Lens, a 3.8-billion-parameter text-to-image model released by Microsoft with both default and distilled checkpoints available in ComfyUI format. The tutorial covers the full pipeline from model loading to generation and refinement, using the channel’s signature modular ‘rodent method’ — color-coded node groups with get/set passthrough from KJ Nodes to eliminate spaghetti wiring and make workflows easy to update.

The video demonstrates multiple generation modes: basic text-to-image using the Lens CLIP type (GPT OSS NVFP4) and a simple scheduler, image-to-image with configurable denoise, and inpainting with mask control via both manual masking and SAM 3 prompt-based automatic masking using RG3 nodes’ any switch for toggling between modes. For upscaling, the tutorial compares four approaches with side-by-side slider comparisons: standard image upscale with a second-pass DDIM + linear quadratic sampling stage, latent upscale (which introduces noise for additional detail generation), the Ultimate SD Upscale tile-based node (suited for very large output sizes), and Seed VR 2 (near-pure sharpening with minimal compositional change).

Specific technical notes include the Load Diffusion Model node configuration, Model Sampling Flex for resolution passthrough, the VAE shared with Flux 2, a practical resolution ceiling of around 2048×2048 for initial generation, and grow-mask-with-blur adjustments for cleaner SAM 3 edge handling. The tutorial is aimed at ComfyUI users familiar with Flux-style workflows looking to evaluate Lens as a text-to-image alternative.


📺 Source: Nerdy Rodent · Published May 28, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels