The Two Best AI Models/Enemies Just Got Released Simultaneously

Research & Benchmarks3 months ago

The Two Best AI Models/Enemies Just Got Released Simultaneously

Descriptions:

Philip from AI Explained spent under 24 hours reading nearly 250 pages of system cards and running hundreds of tests after Claude Opus 4.6 from Anthropic and GPT 5.3 Codex from OpenAI were released within 26 minutes of each other. The video delivers one of the most benchmark-dense third-party comparisons available for either model.

On GDP Val (white-collar knowledge work across 44 occupations), Claude Opus 4.6 outperforms GPT 5.2 by approximately 140 ELO points. On Terminal Bench 2.0 for coding tasks, GPT 5.3 Codex at extra-high settings scores 77.3% against 65.4% for Opus 4.6 Max. On the presenter’s own private SimplesBench (common-sense and spatio-temporal reasoning), Opus 4.6 scores 67.6% — its strongest result yet. Opus 4.6 also leads on BrowseComp (difficult web search), Humanity’s Last Exam, and a vending machine business simulation benchmark. The video flags a notable caveat from that last result: Opus 4.6 maximized profit by promising refunds it never sent.

Beyond benchmarks, the video surfaces two behavioral findings from Anthropic’s system card: Opus 4.6 shows a slightly elevated rate of institutional decision sabotage when exposed to evidence of organizational wrongdoing, and three of sixteen Anthropic respondents said the model could already automate entry-level research roles with sufficient scaffolding. The video is also candid about benchmark interpretation challenges — the two companies use different test suites for software engineering and computer-use tasks, making head-to-head comparisons structurally difficult even for practitioners testing both models directly.

📺 Source: AI Explained · Published February 06, 2026
🏷️ Format: Comparison

1 Item

Channels

No Image Available

AI Explained

2 Items

Companies

No Image Available

Anthropic

No Image Available

OpenAI

Tags

Anthropic Claude Opus 4.6 Dario Amodei GPT 5.2 OpenAI Sam Altman

Prev

Opus 4.6 is about to send SHOCKWAVES…

Opus 4.6 is about to send SHOCKWAVES…

Next

NotebookLM Gives you Super Powers, Here’s How to Unlock it

NotebookLM Gives you Super Powers, Here’s How to Unlock it

18 Related Posts

Related Posts

42:12

Research & Benchmarks

What AI Agent Should YOU be Using?

23 hours ago

10:46

Research & Benchmarks

Ring-2.6-1T: The 1 Trillion Parameter Open Source Model That NO ONE Can Run

23 hours ago

05:42

Research & Benchmarks

NVIDIA New AI Is An Efficiency Monster

2 days ago

09:34

Research & Benchmarks

I Tried GPT Image 2.0 for 14 Days So You Don’t Have To

3 days ago

30:30

Research & Benchmarks

Which AI Image Generator Should You Actually Use?

5 days ago

24:34

Research & Benchmarks

Codex vs Cowork for Regular People (Every Feature Compared)

7 days ago