Grok 4.20 Beta achieved a 97% accuracy rate in the τ²-Bench evaluation, ranking second.

MeNews · 2026-04-09T08:17:18+00:00

ME News Update, April 5 (UTC+8): Recently, Grok 4.20 Beta achieved a 97% accuracy rate in the τ²-Bench evaluation, ranking second. τ²-Bench is an assessment built on the original Sierra τ-bench framework and is known for its rigor. This evaluation not only tests AI

MeNews

2026-04-09 08:17:18

ME News message, April 5 (UTC+8), recently, Grok 4.20 Beta achieved 97% accuracy in the τ²-Bench evaluation, ranking second. τ²-Bench is an evaluation built on the Sierra original τ-bench framework and is known for its rigor. This evaluation not only tests whether AI can answer questions, but also tests whether agents can successfully complete navigation tasks. (Source: InFoQ)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

2 Likes