Gate News message, April 10—today, the global authoritative AI evaluation platform LMArena (an AI model evaluation platform with millions of users participating in blind testing) updated the Code Arena specialized rankings. GLM-5.1 topped the world’s open-source models first, and ranked third globally among all models.
GLM-5.1 not only inherits the open-source SOTA coding capabilities of the previous generation of models, but also made breakthroughs on long-horizon tasks, achieving: building a Linux desktop from scratch in 8 hours; completing 655 iterations to break through the optimization bottleneck of vector databases; and optimizing real machine learning model workloads through 1000 rounds of tool calls.
It is worth noting that under the same evaluation criteria on the METR leaderboard, GLM-5.1 is the only open-source model to achieve sustained work at an 8-hour level, and it is among only a few models worldwide that have this capability, besides Claude Opus 4.6.