Research & Benchmarks - Frontier Models

There are 263 items in this page

14:03

Research & Benchmarks21 hours ago

Fable 5 is Back! Here’s the Best Way to Use It…

With Fable 5 restored to public access on July 1st following government export control restrictions, The AI Advantage breaks down the...

21:10

Research & Benchmarks21 hours ago

I Tested Gemini Spark: What Google’s AI Agent Can Actually Do in 21 Minutes

Peter Yang offers a 21-minute live walkthrough of Gemini Spark, Google's personal AI agent embedded inside the Gemini interface and c...

10:50

Research & Benchmarks2 days ago

Laguna XS 2.1: Poolside’s Local Coding Agent Tested – Nine Languages

Fahd Mirza puts Poolside's newly released Laguna XS 2.1 through a live evaluation using the Hermit agentic framework. The model is a...

12:40

Research & Benchmarks3 days ago

Sonnet 5 vs Ornith 35B: Can a Local Model Beat Closed-Source?

Fahd Mirza runs a direct head-to-head evaluation between Claude Sonnet 5 and Ornith 1 35B — a new open-source mixture-of-experts codi...

10:26

Research & Benchmarks3 days ago

NotebookLM’s Brand New Feature Generates Shorts With One Click

Google's NotebookLM has launched a new "short video overviews" feature that generates polished, educational short-form videos from up...

28:52

Research & Benchmarks3 days ago

GLM-5.2 Proves Open-Source AI is Finally Good Now!

Matt Wolfe puts ZAI's GLM-5.2 through an extended hands-on evaluation, starting with a clear-eyed explanation of what 'open-weight' a...

12:25

Research & Benchmarks4 days ago

LongCat-2.0: China Breaks Free From Nvidia to Train a 1.6T Model

Meituan — better known outside China as a food-delivery giant — has quietly released LongCat 2.0, a 1.6-trillion-parameter mixture-of...

26:20

Research & Benchmarks5 days ago

GLM-5.2 vs MiniMax-M3: Opus Has REAL COMPETITION (Model Stacking)

IndyDevDan makes the case that the open-weight model landscape has fundamentally changed, positioning GLM-5.2 from Zhipu AI as a genu...

15:41

Research & Benchmarks5 days ago

New Agentic Coding Model Ornith 9B — Is It Worth Running Locally?

Bart Slodyczka tests Ornith 1.0, a newly released open-source agentic coding model family from Deep Reinforce AI (San Francisco), foc...

17:36

Research & Benchmarks6 days ago

GLM 5.2 Is Free And Beats Claude On Most Work. So Why Can’t Companies Switch?

Nate B. Jones of AI News & Strategy Daily delivers a candid, firsthand evaluation of GLM 5.2, an open-source model from Zhipu AI...

16:35

Research & Benchmarks1 week ago

Introducing Ornith 1.0

Sam Witteveen introduces Ornith 1.0, a new family of open-weight models from Deep Reinforce that takes a fundamentally different appr...

09:41

Research & Benchmarks1 week ago

OpenAI Just Gave Codex a Superpower & More AI News You Can Use

The AI Advantage reviews OpenAI Codex's newly released record-and-replay feature — currently Mac-only and requiring at least a $20/mo...

27:14

Research & Benchmarks1 week ago

GLM 5.2 is SO GOOD (and almost free)

This video from the How I AI channel takes a close look at GLM 5.2, the latest open-weight model from Beijing-based startup Z.AI. The...

10:04

Research & Benchmarks1 week ago

Mistral OCR 4 Is Built Different – 170 Languages, and Does It Beats Them All?

Fahd Mirza, a Mistral AI ambassador, delivers a no-hype hands-on review of Mistral OCR 4, the company's new document extraction model...

12:16

Research & Benchmarks2 weeks ago

I Battle Tested Sakana Fugu’s Fable Killer

Sakana AI, a Japanese company, has launched Fugu Ultra — a multi-agent orchestration system that routes tasks through multiple fronti...

21:01

Research & Benchmarks2 weeks ago

GLM-5.2 vs MiniMax-M3 vs Qwen3.7-Max — 3 Coding Tests, One Winner

Fahd Mirza runs a hands-on three-way coding showdown between GLM-5.2, MiniMax M3, and Qwen 3.7 Max using the Hermes agent framework....