I Tested The Top AI Models. Here’s What Each One Is Best At

Research & Benchmarks2 months ago

I Tested The Top AI Models. Here’s What Each One Is Best At

Descriptions:

Sharbel A. reframes the perennial “which AI model is best” debate by assigning specific jobs to specific models — coding, writing, deep research, reasoning, creative brainstorming, everyday assistance, and cost efficiency — and evaluating each through real-world workflows rather than synthetic benchmarks.

Drawing on experience running a marketing agency with roughly $4 million in revenue over two years, Sharbel tests Claude Opus 4.7 via Claude Code, GPT-5.5 via Codex, Gemini, and others across categories. For coding, he finds Opus 4.7 and Codex with GPT-5.5 essentially tied, with Claude leading on multi-file code editing and repo comprehension while Codex impresses on speed and agentic computer use. For writing, Opus 4.7 scores a composite 27 points against GPT-5.5’s 26 on criteria including human-sounding output, brand voice, tension, and filler word density — with Sonnet 4.6 producing cleaner but less distinctive hooks.

The video includes live scoring sessions where AI-generated content is graded in real time, LM Arena leaderboard citations, and SWE-bench-style comparisons for agentic coding. Sharbel also covers which model most people should stop defaulting to, a model he was previously wrong about, and the single model he’d choose if limited to one. The core takeaway: elite AI users aren’t loyal to one model — they route each task to the right tool, treating models less like allegiances and more like specialists on a team.

📺 Source: Sharbel A. · Published May 28, 2026
🏷️ Format: Comparison

1 Item

Channels

No Image Available

Sharbel A.

1 Item

People

No Image Available

Sharbel A.

Tags

Anthropic Claude Code Claude Opus 4.7 Claude Sonnet 4.6 Codex Gemini 3.1 Pro GPT Image 2 GPT-55 Grok Hermes Kimi K2.6 LM Arena OpenAI OpenClaw Qwen Qwen 3.6 Sharbel A.

Prev

Claude lead gen

Next

Anthropic just dropped Opus 4.8… (WOAH)

18 Related Posts

Related Posts

14:20

Research & Benchmarks

ThinkingCap – The Local Coding Model

2 hours ago

08:11

Research & Benchmarks

Inflect Micro v2 – A Complete Voice AI Under 10M Parameters on CPU

2 days ago

38:44

Research & Benchmarks

Jack Dorsey’s Buzz: The New Hermes Agent?

2 days ago

32:44

Research & Benchmarks

Claude Opus 5 is a freak

3 days ago

12:06

Research & Benchmarks

Microsoft Mage-Flow: Image Generation and Editing Locally

3 days ago

10:56

Research & Benchmarks

Claude Chat vs Cowork vs Code: Which One Should You Use?

3 days ago