DeepSeek Releases V4 Open-Source Model Series with 1.6T Parameters and MIT License

Gate News message, April 24 — DeepSeek has released the V4 series of open-source models under the MIT License, with weights now available on Hugging Face and ModelScope. The series includes two mixture-of-experts (MoE) models: V4-Pro with 1.6 trillion total parameters and 49 billion activated per token, and V4-Flash with 284 billion total parameters and 13 billion activated per token. Both support a 1 million token context window.

The architecture features three key upgrades: a hybrid attention mechanism combining compressed sparse attention (CSA) and heavily compressed attention (HCA) that significantly reduces long-context overhead—V4-Pro’s inference FLOPs for 1M context is just 27% of V3.2’s, and KV cache (VRAM for storing historical information during inference) is only 10% of V3.2’s; manifold-constrained hyperconnections (mHC) replacing traditional residual connections to enhance cross-layer signal propagation stability; and the Muon optimizer for faster training convergence. Pre-training used over 32 trillion tokens of data.

Post-training employs a two-stage approach: first training domain-specific experts via supervised fine-tuning (SFT) and GRPO reinforcement learning, then merging them into a single model through online distillation. V4-Pro-Max (highest inference mode) claims to be the strongest open-source model with top-tier coding benchmarks and significantly narrowed gaps with closed-source frontier models on reasoning and agent tasks. V4-Flash-Max achieves Pro-level reasoning performance with sufficient compute budget but is limited by parameter scale on pure knowledge and complex agent tasks. Weights are stored in mixed FP4+FP8 precision.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

The Trump administration has released an AI refinement crackdown plan, accusing Chinese companies of systematically stealing model capabilities.

The Office of Science and Technology Policy (OSTP) in the White House, in an official statement released on April 23 by Presidential Assistant Michael J. Kratsios, said that the Trump administration has information indicating that foreign entities (mainly based in China) are intentionally targeting major U.S. artificial intelligence companies. They are systematically extracting U.S. AI model capabilities through “tens of thousands of agent accounts” and jailbreaking technology, and are simultaneously announcing four response measures.

MarketWhisper7m ago

DeepSeek releases the V4 open-source preview, with a technical score of 3206 surpassing GPT-5.4

DeepSeek officially launched the V4 preview series on April 24, with open-sourced model weights under the MIT license, and the model weights have been also released on Hugging Face and ModelScope. According to the DeepSeek V4 technical report, V4-Pro-Max (the highest inference intensity mode) scored 3206 points on the Codeforces benchmark, surpassing GPT-5.4.

MarketWhisper23m ago

Cambricon Completes Day 0 Adaptation of DeepSeek-V4, Marking Milestone for China's AI Chip Ecosystem

Gate News message, April 24 — Cambricon announced today that it has completed Day 0 adaptation of DeepSeek-V4, the latest large language model from DeepSeek, using its proprietary NeuWare software ecosystem and vLLM framework. The adaptation code has been open-sourced simultaneously, marking the

GateNews39m ago

Tencent open-sourced Hy3 preview version, code benchmark tests improved by 40% over the previous generation

Tencent officially open-sourced the Hy3 preview version of a large language model on April 23 on GitHub, Hugging Face, and the ModelScope platform, and simultaneously provided paid API service via Tencent Cloud. According to Decrypt’s report on April 24, the Hy3 preview version began training in late January and reached the publication calendar in less than three months.

MarketWhisper46m ago

FTX Portfolio Investments Worth 158 Trillion Won If Not Bankrupt

FTX, the centralized cryptocurrency exchange that filed for Chapter 11 bankruptcy protection in November 2022 due to liquidity shortages and capital outflows, would have held investments valued at approximately 158.796 trillion won if it had not collapsed, according to analysis cited by Park

CryptoFrontier49m ago

Xiaomi Reveals MiMo-V2-Pro Training Details: 1T Model Parameters, Thousands of GPUs Deployed

Gate News message, April 24 — Xiaomi's large language model team lead Luo Fuli disclosed in an in-depth interview that the MiMo-V2-Pro model has 1 trillion parameters in total and required thousands of GPUs for training. She noted that the 1T scale represents the minimum threshold to achieve

GateNews1h ago
Comment
0/400
No comments