Real World AI Evaluations

Business & Strategy5 months ago

Real World AI Evaluations

Descriptions:

Artificial Analysis has built an open evaluation harness on top of OpenAI’s GDPVal benchmark—designed to measure AI performance across 44 occupations on economically meaningful knowledge work tasks—making it runnable on any LLM at scale using an AI grading pipeline. In their initial run, Anthropic’s Claude Opus 4.5 topped the leaderboard at a cost of $68, followed by GPT-5 in second, Claude Sonnet 4.5 in third, GPT-5.1 in fourth (using half the tokens with a slight quality drop), and Deepseek 3.2 and Gemini 3 Pro tied for fifth. Deepseek 3.2 stood out for cost efficiency, completing the benchmark for $29—roughly one-twentieth the cost of Opus.

The episode’s biggest story, however, is a report from The Information claiming that Deepseek built a Blackwell GPU training cluster using chips smuggled into China. According to six sources with knowledge of the matter, Nvidia servers were delivered to third-country data centers, inspected for export compliance, then dismantled and transported into China as individual components. Nvidia disputed the report but acknowledged they “pursue any tip we receive.” If accurate, this would mark the first confirmed instance of a Chinese lab building a commercial-scale training cluster on export-banned hardware—a significant escalation in the chip war.

The episode rounds out with Beijing holding emergency meetings with Alibaba, ByteDance, and Tencent to assess H200 import demand, and a note that ChatGPT is approaching 900 million weekly active users.

📺 Source: The AI Daily Brief: Artificial Intelligence News · Published December 15, 2025
🏷️ Format: News Analysis

4 Items

Companies

No Image Available

DeepSeek

No Image Available

Nvidia

No Image Available

OpenAI

No Image Available

Oracle

Tags

Alibaba Artificial Analysis Blackwell ByteDance ChatGPT China Claude Opus 4.5 Claude Sonnet 4.5 DeepSeek DeepSeek V3.2 GDP Val Gemini 3 Pro GPT-5 GPT-5.1 H200 Nvidia OpenAI Oracle Tencent

Prev

Why AI-Native Companies Are Deleting Software You're Still Paying For (The $56K Lesson)

Why AI-Native Companies Are Deleting Software You're Still Paying For (The $56K Lesson)

Next

OpenAI Researcher QUITS — Says the Company Is Hiding the Truth – (It Actually Worse Than You Think)

OpenAI Researcher QUITS — Says the Company Is Hiding the Truth – (It Actually Worse Than You Think)

18 Related Posts

Related Posts

41:05

Business & Strategy

Anthropic on USA vs China

1 hour ago

24:56

Business & Strategy

everyone JUST got HACKED…

1 hour ago

33:09

Business & Strategy

AI News: Impressive New Model From Unexpected Company

1 hour ago

18:27

Business & Strategy

Combine Skills and MCP to Close the Context Gap — Pedro Rodrigues, Supabase

1 hour ago

06:46

Business & Strategy

The trial of the century is even dumber than expected…

1 hour ago

12:23

Business & Strategy

Claude’s 13 Free AI Courses in 12 Minutes

1 day ago