UI-Venus-1.5: The GUI Agent That Controls Your Screen – Run Locally

Tutorials3 months ago

UI-Venus-1.5: The GUI Agent That Controls Your Screen – Run Locally

Descriptions:

Fahd Mirza walks through a complete local installation of UI-Venus-1.5, a GUI agent model from Inclusion AI designed to navigate websites and applications autonomously by analyzing screenshots. Built on top of Qwen3-VL, the model accepts a screenshot plus a plain English instruction and outputs the exact action to perform — click, type, or scroll — making it suitable for building screen-control automation pipelines without hardcoded UI selectors.

Mirza runs the installation on Ubuntu with an RTX 6000 GPU (48GB VRAM) using vLLM and Hugging Face Transformers, demonstrating the 2-billion parameter variant which comes in under 5GB on disk and loads into under 25GB of VRAM. The model’s development followed four stages: base Qwen3-VL pretraining, large-scale GUI data pretraining, reinforcement learning across separate mobile and web grounding tasks, and final merging of specialized checkpoints into one unified model. Benchmark results show 77.6% on AndroidWorld and 69.6% on ScreenSpot Pro, outperforming GPT-4 on both evaluations. Model sizes span 2B, 8B, and 30B parameters.

The live demo covers coordinate-based output for precise UI element location, handling both mobile and desktop screenshots, and navigating a real YouTube channel page. Mirza notes the model currently performs strongest on Chinese-language apps and recommends the mixture-of-experts flagship variant for production or customer-facing deployments, while the 2B model is sufficient for evaluation and prototyping.

📺 Source: Fahd Mirza · Published March 01, 2026
🏷️ Format: Tutorial Demo

1 Item

Channels

No Image Available

Fahd Mirza

Tags

GPT-4

Prev

OpenAI & Google Just JOINED FORCES – Staff Demand “No Killer AI”

OpenAI & Google Just JOINED FORCES – Staff Demand “No Killer AI”

Next

How to Switch from ChatGPT to Claude (Without Losing Anything!)

How to Switch from ChatGPT to Claude (Without Losing Anything!)

18 Related Posts

Related Posts

14:22

Tutorials

Codex Mobile Released and It’s Insane

1 hour ago

14:38

Tutorials

Using HiDream-O1 Natively in ComfyUI

1 hour ago

10:54

Tutorials

Talkie: I Ran a 1930 AI Model Locally and Talked to People from the Past

1 day ago

03:02

Tutorials

Installing Claude Code

1 day ago

08:17

Tutorials

OpenAI Codex Now Works from Anywhere (Dispatch Killer?)

1 day ago

24:07

Tutorials

Hermes Agent powered by local models on the DGX Spark is basically magic

2 days ago