No hype Claude Opus 4.8 review—my real experience

Research & Benchmarks2 months ago

No hype Claude Opus 4.8 review—my real experience

Descriptions:

Clarvo, a product leader who received early access to Claude Opus 4.8, delivers one of the first substantive hands-on reviews of the model across both coding and business use cases. Testing took place inside a real product — a prototyping capability built into an internal tool called Chat Pod — making this more grounded than most benchmark-first takes. The overall pattern that emerges: Opus 4.8 executes the first 90% of a task well, planning autonomously and shipping working code, but degrades at edge cases, existing codebase navigation, and anything requiring sustained precision.

The most notable finding is a documented hallucination during bug-hunting on high-effort mode — fabricated details presented as fact — which Clarvo notes is rare for a modern frontier model. A direct head-to-head comparison with Opus 4.7 on a business strategy task (analyze three months of activity, generate a growth roadmap) shows 4.7 outperforming 4.8 in data grounding and specificity, while 4.8 over-rotated on minor data points and produced vaguer strategic output.

On the benchmark side, Opus 4.8 scores 69.2% on Swebench Pro — roughly 5 points ahead of 4.7 and about 10 points above GPT 5.5 — at $5 per million input tokens and $25 per million output tokens. For developers deciding whether to upgrade from 4.7 or switch from a competing model, this review provides the kind of concrete failure mode documentation that benchmark tables alone can’t deliver.

📺 Source: How I AI · Published May 28, 2026
🏷️ Format: Review

1 Item

Channels

No Image Available

How I AI

1 Item

Companies

No Image Available

Anthropic

Tags

Anthropic Claude Code Claude Opus 4.7 Claude Opus 4.8 claude.ai Gemini 3.1 GPT-55

Prev

Claude lead gen

Next

Anthropic just dropped Opus 4.8… (WOAH)

18 Related Posts

Related Posts

14:20

Research & Benchmarks

ThinkingCap – The Local Coding Model

2 hours ago

08:11

Research & Benchmarks

Inflect Micro v2 – A Complete Voice AI Under 10M Parameters on CPU

2 days ago

38:44

Research & Benchmarks

Jack Dorsey’s Buzz: The New Hermes Agent?

2 days ago

32:44

Research & Benchmarks

Claude Opus 5 is a freak

3 days ago

12:06

Research & Benchmarks

Microsoft Mage-Flow: Image Generation and Editing Locally

3 days ago

10:56

Research & Benchmarks

Claude Chat vs Cowork vs Code: Which One Should You Use?

3 days ago