Google just dropped Gemini 3.1… (WOAH)

Research & Benchmarks3 months ago

Google just dropped Gemini 3.1… (WOAH)

Descriptions:

Google’s release of Gemini 3.1 Pro is the focus of this video from Matthew Berman, which breaks down the model’s benchmark performance and real-world capabilities in detail. On ARC-AGI 2—a test measuring rapid skill acquisition and generalization—Gemini 3.1 Pro scores 77.1%, more than doubling its predecessor Gemini 3 Pro and edging out Anthropic’s Opus 4.6 at 68.8%. Other headline numbers include 94.3% on GPQA Diamond (scientific knowledge), 80.6% on SWEBench Verified (coding), 99.3% on T2Bench (agentic tool use), and 51.4% on Humanity’s Last Exam when run with a code environment—putting it squarely in competition with the top frontier models.

Beyond benchmarks, the video showcases dramatically improved SVG generation, with Google DeepMind Chief Scientist Jeff Dean demonstrating applications including a geographic urban planning simulator and a prompt-to-CAD-model tool. Berman also notes that Gemini Deep Think—released the prior week—was confirmed to run on Gemini 3.1 Pro under the hood, and that the model is rolling out across Google’s consumer and developer products.

Berman rounds out the analysis with a frank comparison against Anthropic’s Sonnet 4.6, which he calls his current favorite for knowledge work despite its high cost, and reflects on briefly using Gemini 3 Pro as his primary model. The overall takeaway is that Gemini 3.1 Pro is a top-tier model for complex reasoning tasks, though real-world usability will depend on hands-on testing beyond benchmarks.

📺 Source: Matthew Berman · Published February 20, 2026
🏷️ Format: Review

1 Item

Channels

No Image Available

Matthew Berman

Tags

Anthropic ARC AGI 2 Claude Opus 4.6 DeepMind Gemini 3 Deep Think Gemini 3 Pro Gemini 3.1 Pro Google

Prev

Gemini 3.1 + New AI Studio Is Here: Full Prototyping Tutorial in 18 Minutes

Gemini 3.1 + New AI Studio Is Here: Full Prototyping Tutorial in 18 Minutes

Next

You’re Not Behind (Yet): How to Build AI Agents in 2026 (no coding)

You’re Not Behind (Yet): How to Build AI Agents in 2026 (no coding)

18 Related Posts

Related Posts

42:12

Research & Benchmarks

What AI Agent Should YOU be Using?

22 hours ago

10:46

Research & Benchmarks

Ring-2.6-1T: The 1 Trillion Parameter Open Source Model That NO ONE Can Run

22 hours ago

05:42

Research & Benchmarks

NVIDIA New AI Is An Efficiency Monster

2 days ago

09:34

Research & Benchmarks

I Tried GPT Image 2.0 for 14 Days So You Don’t Have To

3 days ago

30:30

Research & Benchmarks

Which AI Image Generator Should You Actually Use?

5 days ago

24:34

Research & Benchmarks

Codex vs Cowork for Regular People (Every Feature Compared)

7 days ago