OpenAI just shipped the Mythos killer (GPT 5.5)

Research & Benchmarks2 months ago

OpenAI just shipped the Mythos killer (GPT 5.5)

Descriptions:

David Ondrej provides a day-zero first look at GPT-5.5 within minutes of its public release, framing the model as OpenAI’s direct answer to Anthropic’s as-yet-unreleased Claude Mythos. Unlike Mythos, GPT-5.5 is immediately available to ChatGPT Pro users and inside Codex, and Ondrej dives straight into benchmarks and live testing: the model scores 82.7 on TerminalBench and 56.6 on SWE-bench Pro using real GitHub issues, alongside claimed improvements in understanding user intent and generating higher-quality output with fewer tokens than GPT-5.4.

The hands-on portion centers on Codex, where Ondrej tests GPT-5.5’s ability to recreate a complex UI from a single screenshot—a unicorn-themed graphic—and finds the reproduction nearly identical without manual correction. He then chains image generation, computer use, and Codex’s CLI skill together to attempt building a Doom-style macOS game from scratch, leveraging the newly updated Codex interface that now includes a built-in browser preview window.

Ondrej also presents a competitive analysis arguing that Anthropic is facing a compute shortage: users are reporting instruction-following regressions in Claude Opus 4.7 compared to 4.6, usage limits are tightening, and Mythos remains unshipped. He notes he currently defaults to Claude Opus 4.6 in Claude Code because 4.7 is less reliable for instruction-following, and suggests that if GPT-5.5 delivers on its agentic coding claims, 2026 could mark a meaningful shift in the OpenAI-Anthropic competitive balance.

📺 Source: David Ondrej · Published April 23, 2026
🏷️ Format: Review

1 Item

Channels

No Image Available

David Ondrej

2 Items

Companies

No Image Available

Anthropic

No Image Available

OpenAI

Tags

Anthropic ChatGPT Claude Mythos Claude Opus 4.6 Claude Opus 4.7 Codex Dario Amodei David Ondrej GPT Image 2 GPT-5.4 GPT-55 OpenAI

Prev

You Don’t Need Better Prompts. You Need Clearer Work.

Next

DeepSeek V4 is Here – Pro and Flash – Model That Made All GPU Clusters Obsolete

18 Related Posts

Related Posts

14:03

Research & Benchmarks

Fable 5 is Back! Here’s the Best Way to Use It…

24 hours ago

21:10

Research & Benchmarks

I Tested Gemini Spark: What Google’s AI Agent Can Actually Do in 21 Minutes

24 hours ago

10:50

Research & Benchmarks

Laguna XS 2.1: Poolside’s Local Coding Agent Tested – Nine Languages

2 days ago

12:40

Research & Benchmarks

Sonnet 5 vs Ornith 35B: Can a Local Model Beat Closed-Source?

3 days ago

10:26

Research & Benchmarks

NotebookLM’s Brand New Feature Generates Shorts With One Click

3 days ago

28:52

Research & Benchmarks

GLM-5.2 Proves Open-Source AI is Finally Good Now!

3 days ago