RAEv2 Open Source: Convergence Speed Increased by 10 Times, 80 Training Epochs Surpass the Previous 800 Epoch Record

robot
Abstract generation in progress
CryptoWorld News: The RAEv2 open-source project was jointly launched by institutions including Adobe Research, the Australian National University (ANU), and the Xie Saining team at New York University (NYU), among others. It improves convergence speed by 10 times, surpassing the previous record of 800 rounds with just 80 training rounds. As an image reconstruction scheme based on diffusion models to replace traditional variational autoencoders (VAE), the new version addresses pain points such as poor reconstruction quality in the first generation, an inability to use standard classifier-free guidance (CFG), and extremely slow convergence. On ImageNet, it achieves a global FID (GFID) score of 1.06 with only 80 training rounds. In its architectural design, the research team delivered three core optimizations: adopting a multi-layer representation scheme that directly sums the outputs of the last K layers of the encoder, preserving the structure of the underlying subspace. The new architecture also clarifies the complementary mechanism between the representation autoencoder and representation alignment (REPA), enabling stronger performance on generative tasks. Tests show that to reach a GFID below 2, the first-generation model requires 177 rounds, while the new architecture needs only 35 rounds.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • 10
  • Repost
  • Share
Comment
Add a comment
Add a comment
PaperSculptureSquidward
· 5h ago
How do REPA and multi-layer representations work together? Wait for the paper to analyze in detail.
View OriginalReply0
GateUser-b6d80ba0
· 22h ago
Adding the last K layers of the encoder together has a bit of a ResNet skip connection feel, but applied in the latent space
View OriginalReply0
NeonVortexInTheSmog
· 22h ago
Diffusion reconstruction + CFG compatibility, clearing technical debt in one go
View OriginalReply0
CyberBridgeDeepPerspective
· 22h ago
Round 35 GFID<2, this efficiency makes the alchemist ecstatic
View OriginalReply0
RevokingPermissionsOnARainy
· 22h ago
Someone finally took the issue of VAE reconstruction blurriness seriously, tearing up.
View OriginalReply0
HoldingPositionsIsLikeTending
· 22h ago
Adobe + ANU + NYU three partners join forces, maximizing resources
View OriginalReply0
CandleAfterTheRain
· 22h ago
The multi-layer representation preserves the underlying structure; this design is very detailed and not just a simple stacking depth.
View OriginalReply0
BitByBitBenny
· 22h ago
GFID 1.06 only 80 rounds, the previous generation 177 rounds was cut off halfway, convergence speed skyrocketing
View OriginalReply0
GateUser-0f8d377b
· 22h ago
Xie Saining's team has connected reconstruction and generation this time; the REPA complementary mechanism has some substance.
View OriginalReply0
Salt-BakedSentimentChart
· 22h ago
Using diffusion models as VAE is indeed a wild idea.
View OriginalReply0
View More
  • Pinned