Poetiq's six-member team’s Meta-System achieved the highest score on LiveCodeBench Pro. This pure API plugin improves itself through recursive self-enhancement to extract task experience, without touching weights or fine-tuning, significantly boosting weak models. After integration, KimiK2.6 rose from 50.0% to 79.9%, Gemini3.0 Flash increased by 10 points, surpassing Gemini3.1 Pro, Claude Opus4.7, and GPT5.2 High. GPT5.5 High reached 93.9% through the plugin, Gemini3.1 Pro paired at 90.9%, surpassing Gemini3 Deep Think. Enterprises can enhance reasoning capabilities without costly fine-tuning.

MeNews

2026-05-23 20:04:52

Abstract generation in progress

AIMPACT News, May 15 (UTC+8), according to Beating Monitoring, a six-person startup team Poetiq founded by former Google and DeepMind researcher Shumeet Baluja and Ian Fischer announced that their Meta-System has set a new record on the programming benchmark LiveCodeBench Pro. This system is a purely API-accessible intelligent harness that automatically extracts task experience through recursive self-improvement. Official tests show that, without touching model weights or fine-tuning, the system directly boosts the coding capabilities of mainstream large models on the market. Test results indicate that this decoupled external system significantly improves weaker models. After integrating Poetiq, Kimi K2.6's accuracy skyrocketed from 50.0% to 79.9%, an absolute score increase of 29.9 percentage points; lightweight Gemini 3.0 Flash improved by 10 points, surpassing its larger version Gemini 3.1 Pro and even defeating the "bigger and more expensive" Claude Opus 4.7 and GPT 5.2 High as claimed by Poetiq. In terms of pushing performance limits, GPT 5.5 High, originally at 89.6%, reached a new height of 93.9% with the external system; meanwhile, the basic Gemini 3.1 Pro paired with this external system scored 90.9%, directly surpassing Google's most powerful reasoning model Gemini 3 Deep Think (88.8%) which has not yet opened its API. Poetiq team stated that traditional fine-tuning locks the improvement effects onto a single model, whereas their seamless plug-and-play external system allows enterprises to avoid the high costs of fine-tuning and deploying full-capacity models for reasoning capabilities. (Source: BlockBeats)

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

13 Likes

Reward
13
7
1
Share

Comment

Add a comment

On-ChainSoilAfterTheRain

· 6h ago

GPT5.5 High directly 93.9%, this plugin is even more powerful than the official upgrade

View OriginalReply0

SlothSignal

· 6h ago

Wait, after installing the Gemini3.1 Pro plugin, it's only 90.9; the native 3.0 Flash that can't be beaten by plugins? This comparison is too ironic.

View OriginalReply0

ForkItAnyway

· 6h ago

Recursive self-improvement + pure API plugin, this approach is so wild. Without changing weights, KimiK2.6 can jump from 50 to 79.9, and companies can indeed save a lot on fine-tuning costs.

View OriginalReply0

VolatilityInATeacup

· 6h ago

Kimi just won big, the jump from 50 to 79.9 is much faster than their own iteration.

View OriginalReply0

PaperHandsPro

· 7h ago

Enterprise-side implementation scenarios should heavily rely on this approach, eliminating the need to stockpile cards or perform RLHF; efficiency can be improved at the API level.

View OriginalReply0

Post-RainReflectionsMarket

· 7h ago

Without touching weights or fine-tuning, relying solely on experience-based extraction and recursive improvement, this approach is quite clever, avoiding a bunch of compliance and cost issues.

View OriginalReply0

Frictionless

· 7h ago

Poetiq's six people came up with this Meta-System, it's pretty impressive

View OriginalReply0