Penguin-VL in 2B and 8B: Worst Vision AI Model Ever: Full Local Testing

Benchmarks2 months ago

Penguin-VL in 2B and 8B: Worst Vision AI Model Ever: Full Local Testing

Descriptions:

Fahd Mirza puts Tencent’s newly released Penguin-VL vision-language models — available in 2B and 8B parameter sizes — through a series of real-world vision tests on an Nvidia RTX 6000 (48GB VRAM), and the results are blunt: both models perform poorly across multiple visual understanding tasks.

Architecturally, Penguin-VL is notable for replacing traditional contrastive vision encoders with an LLM-based vision encoder, a design choice inspired by Qwen’s approach that enables tighter visual-language alignment. The 2B model consumes under 10GB of VRAM. Tencent’s benchmarks claim strong performance on DocVQA, ChartQA, and InfoVQA — topping other 2B models. Mirza tests against those stated strengths: a chart comprehension task where the model refuses to answer, claiming it cannot perform real-time measurements; a traffic scene interpretation that returns an incorrect lane identification; and a simple object-in-hand recognition that hallucinates an apple. The 8B model fares no better on the same prompts.

Throughout, Mirza draws explicit comparisons to Qwen 3.5, noting that even Qwen’s 8B vision model significantly outperforms Penguin-VL on equivalent tasks. The video serves as a practical warning for developers considering small open-weight VLMs: published benchmark scores on curated datasets do not reliably predict real-world vision reasoning capability, and Penguin-VL’s gap between claimed and observed performance is unusually wide.

📺 Source: Fahd Mirza · Published March 14, 2026
🏷️ Format: Benchmark Test

1 Item

Channels

No Image Available

Fahd Mirza

Tags

Fahd Mirza Tencent

Prev

I Tried the AI Coding Tool That Could Replace Cursor

I Tried the AI Coding Tool That Could Replace Cursor

Next

Inside Ramp, the $32B Company Where AI Agents Run Everything | Geoff Charles

Inside Ramp, the $32B Company Where AI Agents Run Everything | Geoff Charles

18 Related Posts

Related Posts

11:12

Benchmarks

Qwen3.6 27B Gets 20% Faster with MTP and llama.cpp Locally

5 days ago

09:15

Benchmarks

ZAYA1-VL-8B: Efficient Open Visual Intelligence – Run Locally

6 days ago

04:40

Benchmarks

One API Key for Every AI Model (Pay With Crypto)

1 week ago

08:57

Benchmarks

Google Releases Gemma 4 MTP Drafters – Run Locally and DFlash Comparison

1 week ago

08:44

Benchmarks

Are AI Coding Skills Just Hype? I Tested Them

2 weeks ago

11:03

Benchmarks

I Didn’t Expect This: Opus 4.7 vs GPT 5.5

2 weeks ago