GPT 5.2: OpenAI Strikes Back

Benchmarks5 months ago

GPT 5.2: OpenAI Strikes Back

Descriptions:

AI Explained’s Philip covers the GPT 5.2 release from OpenAI with nine data points designed to go beyond headline claims, with particular attention to what the benchmarks actually measure and where the numbers can mislead. The central throughline is test-time compute: performance on modern AI benchmarks is increasingly a function of how many tokens a model is allowed to spend thinking, which makes direct comparisons between models running at different compute budgets structurally unreliable.

On GDP Val — OpenAI’s headline benchmark for professional knowledge work across 44 occupations — GPT 5.2 claims to reach expert-level performance at 71% of comparisons. The video carefully unpacks the limitations: tasks must be predominantly digital, are well-specified in advance, and the benchmark explicitly excludes catastrophic errors. On the presenter’s own private SimplesBench, GPT 5.2 scores 57.4% against Gemini 3 Pro’s 76.4% and a human baseline of roughly 84%. On chart reasoning via the Charive benchmark, GPT 5.2 leads at 88.7% versus Gemini 3 Pro’s 81%. Humanity’s Last Exam and GPQA Diamond show roughly tied results between both models at around 45-46%.

The video also flags that OpenAI did not compare GPT 5.2 against Claude Opus 4.5 or Gemini 3 Pro in its own release materials — a change from previous practice the presenter had previously praised. One GPQA lead author is quoted acknowledging that 5-10% of that benchmark’s questions may contain noise. Practical testing on a football results spreadsheet task shows notable performance differences between the $200 Pro tier (GPT 5.2 Pro) and the standard version, raising questions about which model tier headline benchmark numbers actually represent.

📺 Source: AI Explained · Published December 12, 2025
🏷️ Format: Benchmark Test

1 Item

Channels

No Image Available

AI Explained

1 Item

Companies

No Image Available

OpenAI

Tags

Anthropic ARC AGI Claude Opus 4.5 GDP Val Gemini 3 Pro Google GPT 5.2 GPT-5.1 GPT-5.2 Pro OpenAI Sam Altman

Prev

Shipmas Day 7: Create Personal Software Tools With Claude Code

Shipmas Day 7: Create Personal Software Tools With Claude Code

Next

The Mathematical Foundations of Intelligence [Professor Yi Ma]

The Mathematical Foundations of Intelligence [Professor Yi Ma]

18 Related Posts

Related Posts

11:12

Benchmarks

Qwen3.6 27B Gets 20% Faster with MTP and llama.cpp Locally

5 days ago

09:15

Benchmarks

ZAYA1-VL-8B: Efficient Open Visual Intelligence – Run Locally

6 days ago

04:40

Benchmarks

One API Key for Every AI Model (Pay With Crypto)

1 week ago

08:57

Benchmarks

Google Releases Gemma 4 MTP Drafters – Run Locally and DFlash Comparison

1 week ago

08:44

Benchmarks

Are AI Coding Skills Just Hype? I Tested Them

2 weeks ago

11:03

Benchmarks

I Didn’t Expect This: Opus 4.7 vs GPT 5.5

2 weeks ago