According to Beating, MIT Kaiming He's team recently released ELF (Embedded Language Flows), a language diffusion model that departs from the autoregressive "predict next token" approach used by GPT-style models. Instead, ELF performs text generation in a continuous embedding space, converting to discrete tokens only in the final step.
In OpenWebText unconditional generation benchmarks, the 105M-parameter ELF-B achieved approximately 24.1 generation perplexity (Gen. PPL) with 32-step sampling, outperforming multiple discrete and continuous diffusion language model baselines. Notably, ELF-B required only approximately 45 billion training tokens, roughly one order of magnitude fewer than comparable methods which typically exceed 500 billion tokens.