Researcher: All mainstream AI benchmarks can be "faked," and leaderboard data may be seriously distorted

CoinNetwork · 2026-04-10T02:43:55+00:00

Research shows that multiple AI benchmarks have exploitable systemic vulnerabilities that allow agents to achieve full scores without completing the tasks. The team identified various common vulnerabilities and developed an automated analysis tool, WEASEL, to locate and exploit these vulnerabilities.

CoinNetwork

2026-04-10 02:43:55

Abstract generation in progress

Crypto界网消息，ME News消息，4月10日（UTC+8），AI研究员Hao Wang发布研究披露，包括SWE-bench Verified和Terminal-Bench在内的多个业内最权威的AI基准测试，均存在可被系统性利用的漏洞——其团队构建的Agent在未解决任何实际任务的情况下，在两项基准上均取得了满分100%。
典型案例如下：
SWE-bench Verified：在代码仓库中植入一个10行的pytest钩子，测试运行前自动将所有结果篡改为“通过”，评分系统对此毫无察觉，500道题全部满分；
Terminal-Bench：该基准虽对测试文件加以保护，但未保护系统二进制文件。Agent替换了curl，拦截验证器的依赖安装流程，完成底层劫持；
WebArena：参考答案以明文存储在本地JSON配置文件中，且Playwright Chromium未限制file://协议访问，模型可直接读取答案后原样输出。
团队对8个基准测试的审计发现了7类重复出现的共性漏洞，包括：Agent与评估器之间缺乏隔离、答案随测试一同下发、LLM裁判易遭提示注入攻击等。
值得警惕的是，评估系统绕过行为已在o3、Claude 3.7 Sonnet及Mythos Preview等前沿模型中被自发观察到，无需显式指令触发。
团队据此开发了基准测试漏洞扫描工具WEASEL，可自动分析评估流程、定位隔离边界薄弱点并生成可用漏洞利用代码，相当于针对基准测试的“渗透测试”工具，目前开放早期访问申请。

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

1 Likes