Futures
Access hundreds of perpetual contracts
CFD
Gold
One platform for global traditional assets
Options
Hot
Trade European-style vanilla options
Unified Account
Maximize your capital efficiency
Demo Trading
Introduction to Futures Trading
Learn the basics of futures trading
Futures Events
Join events to earn rewards
Demo Trading
Use virtual funds to practice risk-free trading
Launch
CandyDrop
Collect candies to earn airdrops
Launchpool
Quick staking, earn potential new tokens
HODLer Airdrop
Hold GT and get massive airdrops for free
Pre-IPOs
Unlock full access to global stock IPOs
Alpha Points
Trade on-chain assets and earn airdrops
Futures Points
Earn futures points and claim airdrop rewards
Promotions
AI
Gate AI
Your all-in-one conversational AI partner
Gate AI Bot
Use Gate AI directly in your social App
GateClaw
Gate Blue Lobster, ready to go
Gate for AI Agent
AI infrastructure, Gate MCP, Skills, and CLI
Gate Skills Hub
10K+ Skills
From office tasks to trading, the all-in-one skill hub makes AI even more useful.
GateRouter
Smartly choose from 40+ AI models, with 0% extra fees
Why can't the large model generate "Ma Jiaqi"? MiniMax full vocabulary scan reveals that nearly 5% of tokens are forgotten during subsequent training.
According to Beating Monitoring, MiniMax published a technical blog revealing the root cause investigation process for why its M2 series large models cannot output the name “Ma Jiaqi.” The investigation started from a single case and ultimately uncovered a systemic degradation issue affecting the entire vocabulary.
The root cause is that the tokenizer (the component that splits text into units processed by the model) merged “Jiaqi” into a single independent token during training.
During pretraining, the model saw plenty of internet texts and learned this token; but in subsequent fine-tuning with dialogue data, there were fewer than five samples containing “Jiaqi.”
In the fine-tuning process, high-frequency tokens like tool_call markers and code symbols continuously updated their surrounding vector space, pushing low-frequency tokens like “Jiaqi” into the wrong directions.
The model still “recognizes” Ma Jiaqi and can accurately answer related information; the only loss is its ability to output this token.
The team then conducted a full scan of the complete vocabulary of about 200,000 tokens and found that approximately 4.9% of tokens experienced significant degradation.
The most severe degradation was in Japanese: 29.7% of Japanese tokens degraded significantly, far exceeding Korean (3.3%), Russian (3.7%), Chinese (3.9%), and English (3.5%).
Among the top degraded tokens were “Legend Private Server” and “Painless Abortion,” which are internet SEO spam words, with mechanisms identical to “Jiaqi.”
The severe degradation in Japanese also solved an old mystery. Previously, the model occasionally mixed Russian or Korean characters into Japanese conversations, with no clear explanation.
This analysis shows that after Japanese token parameter drift, Japanese tokens became confused with tokens from other languages in the vector space, leading to both incorrect activation of Japanese tokens (language mixing) and pushing neighboring low-frequency Chinese tokens out of normal probability ranges (token forgetting).
The fix was to construct a comprehensive synthetic dataset covering the entire vocabulary, training the model on simple repetition tasks until each token was mastered.
The results were immediate: the proportion of Japanese responses mixed with Russian characters dropped from 47% to 1%, and the stability of the entire vocabulary output parameters (cosine similarity) increased from a low of 0.329 to above 0.97 across the board.