GPT 5.4 is so cracked

Research & Benchmarks4 months ago

GPT 5.4 is so cracked

Descriptions:

The AI Search channel puts OpenAI’s newly released GPT 5.4 through a rigorous set of real-world capability tests, going well beyond simple prompts to probe the model’s limits across coding, creative composition, medical imaging, and document analysis. All major demos are run inside OpenAI’s Codex coding agent, which works across entire multi-file projects rather than single-file outputs.

Standout tests include building a fully interactive 3D digital twin of Earth with seamless zoom from orbit to street level — achieved in just three or four iterative prompts — and composing a 32-bar piano piece described as notably more musically complex than outputs from competing models like Gemini 3.1 and GLM5. The video also covers GPT 5.4’s multimodal capabilities: the model is asked to identify and annotate lesions in CT scan imagery, and separately to synthesize earnings reports from Google, Nvidia, and Amazon into a single formatted PDF with charts, growth forecasts, and analyst recommendations after 17 minutes of extended thinking.

The reviewer notes that while GPT 5.4 leads on reasoning-heavy and multimodal tasks, it lags behind some competitors in front-end design quality. The extended thinking mode (set to “extra high” reasoning effort) is used consistently throughout, giving a clear sense of the model’s top-end performance ceiling across diverse domains.

📺 Source: AI Search · Published March 07, 2026
🏷️ Format: Review

1 Item

Channels

No Image Available

AI Search

1 Item

Companies

No Image Available

OpenAI

Tags

Anthropic ARC AGI 2 Artificial Analysis ChatGPT Claude Opus 4.6 Codex GDP Val Gemini 3.1 Pro GLM5 GPT Image 1.5 GPT-5.2 GPT-5.4 LM Arena OpenAI

Prev

LLMfit – Stop Guessing Which AI Models Fit Your GPU or CPU Locally

LLMfit – Stop Guessing Which AI Models Fit Your GPU or CPU Locally

Next

AI Agents Full Course 2026: Master Agentic AI (2 Hours)

AI Agents Full Course 2026: Master Agentic AI (2 Hours)

18 Related Posts

Related Posts

14:03

Research & Benchmarks

Fable 5 is Back! Here’s the Best Way to Use It…

24 hours ago

21:10

Research & Benchmarks

I Tested Gemini Spark: What Google’s AI Agent Can Actually Do in 21 Minutes

24 hours ago

10:50

Research & Benchmarks

Laguna XS 2.1: Poolside’s Local Coding Agent Tested – Nine Languages

2 days ago

12:40

Research & Benchmarks

Sonnet 5 vs Ornith 35B: Can a Local Model Beat Closed-Source?

3 days ago

10:26

Research & Benchmarks

NotebookLM’s Brand New Feature Generates Shorts With One Click

3 days ago

28:52

Research & Benchmarks

GLM-5.2 Proves Open-Source AI is Finally Good Now!

3 days ago