I Tested The Top AI Models. Here’s What Each One Is Best At

I Tested The Top AI Models. Here’s What Each One Is Best At

More

Descriptions:

Sharbel A. reframes the perennial “which AI model is best” debate by assigning specific jobs to specific models — coding, writing, deep research, reasoning, creative brainstorming, everyday assistance, and cost efficiency — and evaluating each through real-world workflows rather than synthetic benchmarks.

Drawing on experience running a marketing agency with roughly $4 million in revenue over two years, Sharbel tests Claude Opus 4.7 via Claude Code, GPT-5.5 via Codex, Gemini, and others across categories. For coding, he finds Opus 4.7 and Codex with GPT-5.5 essentially tied, with Claude leading on multi-file code editing and repo comprehension while Codex impresses on speed and agentic computer use. For writing, Opus 4.7 scores a composite 27 points against GPT-5.5’s 26 on criteria including human-sounding output, brand voice, tension, and filler word density — with Sonnet 4.6 producing cleaner but less distinctive hooks.

The video includes live scoring sessions where AI-generated content is graded in real time, LM Arena leaderboard citations, and SWE-bench-style comparisons for agentic coding. Sharbel also covers which model most people should stop defaulting to, a model he was previously wrong about, and the single model he’d choose if limited to one. The core takeaway: elite AI users aren’t loyal to one model — they route each task to the right tool, treating models less like allegiances and more like specialists on a team.


📺 Source: Sharbel A. · Published May 28, 2026
🏷️ Format: Comparison

1 Item

Channels

1 Item

People