Descriptions:
Researchers built SimWorld — a procedurally generated city simulation complete with roads, traffic, restaurants, and a working delivery economy — then populated it with AI agents running on ChatGPT, Gemini, DeepSeek, Claude, and GPT-4o-mini to see how different models behave under real competitive and economic pressure. The results, covered by Two Minute Papers host Dr. Károly Zsolnai-Fehér, are both quantitatively specific and genuinely surprising.
DeepSeek and Claude pursued aggressive, high-variance strategies and earned close to 70 units of profit — but with enormous swings. Gemini played conservatively, landing around 42 units with much lower volatility. GPT-4o-mini scored exactly zero, unable to parse the game rules at all. The researchers also assigned Big Five personality traits to agents: conscientious agents consistently outperformed others by ignoring upgrades and executing tasks, while “open” agents went bankrupt buying equipment they never used. Disagreeable agents refused to work entirely. Price wars emerged organically, with DeepSeek and Qwen undercutting rivals aggressively to win contracts while ChatGPT held its prices and lost business.
One of the more counterintuitive findings: when the market was flooded with delivery orders, agents became less productive rather than more, opting to wait for ideal opportunities. The study offers a rare look at how frontier AI models differ not just in capability but in economic temperament and risk tolerance when placed inside structured competitive environments.
📺 Source: Two Minute Papers · Published December 14, 2025
🏷️ Format: Showcase







