MIT Kaiming He's Team Releases ELF Language Diffusion Model with 45B Training Tokens

According to Beating, MIT Kaiming He's team recently released ELF (Embedded Language Flows), a language diffusion model that departs from the autoregressive "predict next token" approach used by GPT-style models. Instead, ELF performs text generation in a continuous embedding space, converting to discrete tokens only in the final step.

In OpenWebText unconditional generation benchmarks, the 105M-parameter ELF-B achieved approximately 24.1 generation perplexity (Gen. PPL) with 32-step sampling, outperforming multiple discrete and continuous diffusion language model baselines. Notably, ELF-B required only approximately 45 billion training tokens, roughly one order of magnitude fewer than comparable methods which typically exceed 500 billion tokens.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments