ARC AGI 2 - Frontier Models

There are 20 items in this page

01:24:35

Interviews3 days ago

ARC-AGI-3 Explained by the Team That’s Winning It

Machine Learning Street Talk convenes several members of a top-performing ARC-AGI-3 competition team — Jon Kotar, Stephano, D. Smith,...

23:25

Foundation Models4 weeks ago

The Art & Science of Benchmarking Agents — Vincent Chen, Snorkel AI

Vincent Chen, research fellow and co-founder at Snorkel AI, took the stage at AI Engineer to share meta-level lessons on what separat...

27:13

Coding & Dev Tools2 months ago

GPT-5.5 is a total freak

This video puts OpenAI's newly released GPT-5.5 through a series of demanding real-world coding challenges, moving well beyond the st...

02:05:30

Interviews2 months ago

GPT 5.5 LET’S GOOOOOOOO!

Wes Roth and Dylan Curious co-stream a live community first-look at GPT-5.5 on the day of its release, testing the model across ChatG...

19:41

Claude Opus 4.7 – A New Frontier, in Performance … and Drama

Research & Benchmarks3 months ago

Claude Opus 4.7 – A New Frontier, in Performance … and Drama

AI Explained takes a thorough and critical look at Claude Opus 4.7, Anthropic's latest flagship model released in April 2026, coverin...

15:55

All of AI’s New Models and Tools

Business & Strategy3 months ago

All of AI’s New Models and Tools

The AI Daily Brief delivers a dense weekly roundup covering every significant model and tool release from the past seven days, anchor...

12:10

New Tests Reveal The Truth About China’s AI Progress…

Benchmarks3 months ago

New Tests Reveal The Truth About China’s AI Progress…

TheAIGRID examines new benchmark data challenging the prevailing narrative that Chinese AI labs have caught up with Western frontier...

11:15

ARC AGI 3 just dropped, what it means for AGI

Business & Strategy3 months ago

ARC AGI 3 just dropped, what it means for AGI

ARC-AGI 3 has officially launched as the first interactive version of the Abstraction and Reasoning Corpus benchmark, and Matthew Ber...

29:54

GPT 5.4 is so cracked

Research & Benchmarks4 months ago

GPT 5.4 is so cracked

The AI Search channel puts OpenAI's newly released GPT 5.4 through a rigorous set of real-world capability tests, going well beyond s...

16:29

Gemini 3.1 Pro in Antigravity can do anything… just watch

Research & Benchmarks4 months ago

Gemini 3.1 Pro in Antigravity can do anything… just watch

David Ondrej tests Gemini 3.1 Pro inside Anti-Gravity, Google's multi-agent IDE, using it to build an autonomous web scraping agent w...