Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah Hill-Smith

Artificial Analysis: The Independent LLM Analysis House — with George Cameron and Micah Hill-Smith

More

Descriptions:

George Cameron and Micah Hill-Smith, co-founders of Artificial Analysis, join Latent Space to discuss how their independent AI benchmarking company has grown in two years from a free website into a 20-person organization serving enterprises and AI companies with subscription benchmark reports and custom private evaluations. The conversation covers their business model and their firm commitment to keeping public benchmarks independent — no company pays to appear in their rankings.

On the technical side, Cameron and Hill-Smith walk through several of their evaluations, including a newly launched hallucination rate benchmark and a hard physics benchmark where the top-performing model scores just 9%. They share a notable cross-lab finding: Gemini 3 Pro showed a large accuracy improvement over prior Gemini models but essentially no change in hallucination rate, in contrast to Claude model families — a pattern they attribute to different post-training recipes across labs rather than any difference in underlying capability.

The discussion also explores methodological challenges in independent AI evaluation: how to get the industry to converge on shared metrics when every lab publishes self-reported numbers on their own system cards, why public benchmarks now saturate within months of release, and when hallucination is actually desirable — such as physics researchers using high-temperature sampling to explore novel hypotheses. For developers and enterprise teams navigating model selection, the episode offers both a portrait of the most systematic independent LLM evaluation operation and a substantive look at the current state of AI measurement.


📺 Source: Latent Space · Published January 09, 2026
🏷️ Format: Interview