OpenAI just dropped GPT-5.4 and WOW….

Research & Benchmarks2 months ago

OpenAI just dropped GPT-5.4 and WOW….

Descriptions:

Matthew Berman delivers an early-access review of GPT-5.4 after a week of hands-on testing, positioning the model as OpenAI’s direct answer to Anthropic’s Opus 4.6. The core architectural argument: where OpenAI previously split coding capability (GPT-5.3 Codex) and general reasoning/personality (GPT-5.2) across two separate models, GPT-5.4 merges both into a single frontier model with 1 million token context, matching Claude’s context offering for the first time.

Berman walks through OpenAI’s benchmark comparisons against Anthropic and Google models — a notable departure from prior releases. GPT-5.4 Thinking scores 75% on OS World (computer use) versus Opus 4.6’s 72.7%, 57.7% on Swebench Pro, and 83% on OpenAI’s GDP Val benchmark — five points above Opus 4.6 and 13 points above GPT-5.3 Codex. He flags that GPT-5.4 Pro, the more expensive variant, scores lower than 5.4 Thinking on several benchmarks despite higher cost. The model also introduces a “plan before build” mode for agentic workflows, allowing users to review a structured plan before committing tokens to execution.

Demo highlights include real-time Gmail automation (starring, labeling, creating calendar invites), bulk JSON data entry at what appears to be real-time speed, and vision-driven computer use. Berman identifies knowledge workers — those using AI for document processing, spreadsheets, browser automation, and coding — as the primary target segment, mirroring the positioning Anthropic used for Sonnet 4.6.

📺 Source: Matthew Berman · Published March 06, 2026
🏷️ Format: Review

1 Item

Channels

No Image Available

Matthew Berman

1 Item

Companies

No Image Available

OpenAI

Tags

Anthropic Claude Opus 4.6 GDP Val GPT 5.2 GPT 5.4 GPT-5.3 Codex OpenAI OpenClaw Sam Altman

Prev

Higgsfield’s NEW Soul 2.0 AI Image Generator is AMAZING

Higgsfield’s NEW Soul 2.0 AI Image Generator is AMAZING

Next

India Enters the AI Race: Running Sarvam-30B Locally

India Enters the AI Race: Running Sarvam-30B Locally

18 Related Posts

Related Posts

42:12

Research & Benchmarks

What AI Agent Should YOU be Using?

23 hours ago

10:46

Research & Benchmarks

Ring-2.6-1T: The 1 Trillion Parameter Open Source Model That NO ONE Can Run

23 hours ago

05:42

Research & Benchmarks

NVIDIA New AI Is An Efficiency Monster

2 days ago

09:34

Research & Benchmarks

I Tried GPT Image 2.0 for 14 Days So You Don’t Have To

3 days ago

30:30

Research & Benchmarks

Which AI Image Generator Should You Actually Use?

5 days ago

24:34

Research & Benchmarks

Codex vs Cowork for Regular People (Every Feature Compared)

7 days ago