No weight adjustment, pure API tuning: Poetiq "plugin" boosts Kimi by 29.9 percentage points, lightweight Gemini counterattacks Claude Opus

robot
Abstract generation in progress
AIMPACT News, May 15 (UTC+8), according to Beating Monitoring, a six-person startup team Poetiq founded by former Google and DeepMind researcher Shumeet Baluja and Ian Fischer announced that their Meta-System has set a new record on the programming benchmark LiveCodeBench Pro. This system is a purely API-accessible intelligent harness that automatically extracts task experience through recursive self-improvement. Official tests show that, without touching model weights or fine-tuning, the system directly boosts the coding capabilities of mainstream large models on the market. Test results indicate that this decoupled external system significantly improves weaker models. After integrating Poetiq, Kimi K2.6's accuracy skyrocketed from 50.0% to 79.9%, an absolute score increase of 29.9 percentage points; lightweight Gemini 3.0 Flash improved by 10 points, surpassing its larger version Gemini 3.1 Pro and even defeating the "bigger and more expensive" Claude Opus 4.7 and GPT 5.2 High as claimed by Poetiq. In terms of pushing performance limits, GPT 5.5 High, originally at 89.6%, reached a new height of 93.9% with the external system; meanwhile, the basic Gemini 3.1 Pro paired with this external system scored 90.9%, directly surpassing Google's most powerful reasoning model Gemini 3 Deep Think (88.8%) which has not yet opened its API. Poetiq team stated that traditional fine-tuning locks the improvement effects onto a single model, whereas their seamless plug-and-play external system allows enterprises to avoid the high costs of fine-tuning and deploying full-capacity models for reasoning capabilities. (Source: BlockBeats)
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 7
  • 1
  • Share
Comment
Add a comment
Add a comment
On-ChainSoilAfterTheRain
· 6h ago
GPT5.5 High directly 93.9%, this plugin is even more powerful than the official upgrade
View OriginalReply0
SlothSignal
· 6h ago
Wait, after installing the Gemini3.1 Pro plugin, it's only 90.9; the native 3.0 Flash that can't be beaten by plugins? This comparison is too ironic.
View OriginalReply0
ForkItAnyway
· 6h ago
Recursive self-improvement + pure API plugin, this approach is so wild. Without changing weights, KimiK2.6 can jump from 50 to 79.9, and companies can indeed save a lot on fine-tuning costs.
View OriginalReply0
VolatilityInATeacup
· 6h ago
Kimi just won big, the jump from 50 to 79.9 is much faster than their own iteration.
View OriginalReply0
PaperHandsPro
· 7h ago
Enterprise-side implementation scenarios should heavily rely on this approach, eliminating the need to stockpile cards or perform RLHF; efficiency can be improved at the API level.
View OriginalReply0
Post-RainReflectionsMarket
· 7h ago
Without touching weights or fine-tuning, relying solely on experience-based extraction and recursive improvement, this approach is quite clever, avoiding a bunch of compliance and cost issues.
View OriginalReply0
Frictionless
· 7h ago
Poetiq's six people came up with this Meta-System, it's pretty impressive
View OriginalReply0
  • Pinned