Claude's Chinese Language Tokenization Cost 65% Higher Than English, OpenAI Only 15% More

Gate News message, April 29 — AI researcher Aran Komatsuzaki conducted a comparative analysis of tokenization efficiency across six major AI models by translating Rich Sutton’s seminal paper “The Bitter Lesson” into nine languages and processing them through OpenAI, Gemini, Qwen, DeepSeek, Kimi, and Claude’s tokenizers. Using the English version’s token count on OpenAI as the baseline (1x), the study revealed significant disparities: processing the same content in Chinese required 1.65x tokens on Claude, compared to only 1.15x on OpenAI. Hindi showed an even more extreme result on Claude, exceeding the baseline by over 3x. Anthropic ranked lowest among the six models tested.

Critically, when the identical Chinese text was processed across different models—all measured against the same English baseline—the results diverged dramatically: Kimi consumed only 0.81x tokens (even less than English), Qwen 0.85x, while Claude required 1.65x. This gap reveals a pure tokenization efficiency problem, not an inherent language issue. Chinese models demonstrated superior efficiency in processing Chinese, suggesting the disparity stems from tokenizer optimization rather than the language itself.

The practical implications for users are substantial: increased token consumption directly raises API costs, extends model response latency, and depletes context windows more rapidly. Tokenization efficiency depends on the linguistic composition of a model’s training data—models trained predominantly on English compress English text more efficiently, while languages with lower data representation are tokenized into smaller, less efficient fragments.

Komatsuzaki’s conclusion underscores a fundamental principle: market size determines tokenization efficiency. Larger markets receive better optimization, while underrepresented languages face significantly higher token costs.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

AI Financial Platform Rogo Raises $160M in Series D Led by Kleiner Perkins in Less Than 3 Months

According to Beating, AI platform Rogo designed for high-frequency financial scenarios completed a $160 million Series D funding round in April 2026, led by Kleiner Perkins with participation from Sequoia, Thrive Capital, Khosla Ventures, and J.P. Morgan. The funding came less than three months

GateNews43m ago

China Blocks Meta-Backed Manus AI Acquisition on April 29, Citing Tech and Data Security Concerns

According to PANews, on April 29, China's National Development and Reform Commission investment security review office banned a foreign acquisition of the Manus project and required the transaction be terminated. Manus, billed as the world's first general artificial intelligence agent, had

GateNews1h ago

Alibaba Cloud Cuts DeepSeek-V4-Pro Implicit Cache Pricing to 1 Yuan per Million Tokens on April 29

According to Alibaba Cloud, its Bailian platform will reduce the pricing for DeepSeek-V4-Pro model's implicit cache (Implicit Cache) to 1 yuan per million tokens effective April 29, 2026 at 23:59:59 Beijing time. Implicit cache only applies when requests hit the cache; cached input tokens are

GateNews1h ago

AI Platform Certifyde Raises $2M in Seed Funding with Ripple CEO Brad Garlinghouse

According to ChainCatcher, AI application platform Certifyde announced the completion of a $2 million seed funding round. Investors include K5 Global, Flamingo Capital, and angel investors such as Ripple CEO Brad Garlinghouse, Honey co-founder George Ruan, and Nutra co-founder Roland

GateNews3h ago

DeepSeek Launches Image Recognition Feature in Beta Testing

According to PANews, DeepSeek launched its image recognition feature today (April 29), currently in beta testing. Both the web version and mobile app users may be selected for the beta rollout.

GateNews4h ago

Anthropic Launches 8 Creative Tool Connectors for Claude, Including Blender, Adobe, Autodesk

Anthropic has announced a suite of creative tool connectors that enable Claude to directly control professional software used by designers and musicians. The initial eight connectors span 3D modeling, visual design, music production, and live performance, with partners including Blender, Adobe,

GateNews4h ago
Comment
0/400
No comments