How METR measures Long Tasks and Experienced Open Source Dev Productivity – Joel Becker, METR
Joel Becker from METR (Model Evaluation and Threat Research) presents the organization's framework for measuring AI agent task horizo...









