Opus 4.7 just dropped… and I’m confused.

Opus 4.7 just dropped… and I’m confused.

More

Descriptions:

Matthew Berman analyzes the Claude Opus 4.7 release with a focus on what the benchmark numbers reveal about Anthropic’s broader model strategy — and why Anthropic’s simultaneous publication of the unreleased Mythos preview makes the picture more complicated. Opus 4.7 shows a substantial jump on SWEbench Pro (53.4% to 64.3%) and SWEbench Verified (80% to 87%), while taking a regression on agentic search — Browsecomp dropped from 83.7% to 79.3%. The cyber security vulnerability reproduction score held nearly flat at 73.1% versus Mythos preview’s 83.1%, which Berman suggests may be a deliberate design choice rather than a training limitation.

The video’s central argument is that Anthropic’s business flywheel runs through coding: build the best coding model, monetize it to enterprise, reinvest in GPU capacity, and use the model to build the next generation. Opus 4.7 closed nearly half the gap between 4.6 and Mythos on coding benchmarks in a single dot release, raising the question of where the safety line actually sits as Opus iterations continue. Berman’s working theory is that Mythos represents an entirely new training run — rumored at 10 trillion parameters versus roughly 1 trillion for the Opus family — making it categorically different even in its unpolished first version.

Practical guidance for users includes updated prompting advice: Opus 4.7 is significantly more literal than 4.6 and should not be treated as a drop-in replacement. Prompts that relied on the earlier model filling in ambiguous instructions should be rewritten before deploying in production or agentic workflows.


📺 Source: Matthew Berman · Published April 16, 2026
🏷️ Format: News Analysis

1 Item

Channels

1 Item

Companies