Descriptions:
Matthew Berman delivers an early-access review of GPT-5.4 after a week of hands-on testing, positioning the model as OpenAI’s direct answer to Anthropic’s Opus 4.6. The core architectural argument: where OpenAI previously split coding capability (GPT-5.3 Codex) and general reasoning/personality (GPT-5.2) across two separate models, GPT-5.4 merges both into a single frontier model with 1 million token context, matching Claude’s context offering for the first time.
Berman walks through OpenAI’s benchmark comparisons against Anthropic and Google models — a notable departure from prior releases. GPT-5.4 Thinking scores 75% on OS World (computer use) versus Opus 4.6’s 72.7%, 57.7% on Swebench Pro, and 83% on OpenAI’s GDP Val benchmark — five points above Opus 4.6 and 13 points above GPT-5.3 Codex. He flags that GPT-5.4 Pro, the more expensive variant, scores lower than 5.4 Thinking on several benchmarks despite higher cost. The model also introduces a “plan before build” mode for agentic workflows, allowing users to review a structured plan before committing tokens to execution.
Demo highlights include real-time Gmail automation (starring, labeling, creating calendar invites), bulk JSON data entry at what appears to be real-time speed, and vision-driven computer use. Berman identifies knowledge workers — those using AI for document processing, spreadsheets, browser automation, and coding — as the primary target segment, mirroring the positioning Anthropic used for Sonnet 4.6.
📺 Source: Matthew Berman · Published March 06, 2026
🏷️ Format: Review







