RAEv2, jointly released by Adobe Research, ANU, and NYU Xie Sai Ning team, uses a diffusion model to replace VAE for image reconstruction, addressing issues such as poor initial reconstruction, inability to perform CFG, and slow convergence. ImageNet 80 epochs GFID 1.06, GFID less than 2 in only 35 epochs, compared to 177 epochs in the previous version. Core innovations include multi-layer representations: adding the outputs of the last K layers of the encoder to preserve low-level structures, and introducing the REPA complementary mechanism to enhance generative capability.

CoinNetwork

2026-05-22 11:11:50

Abstract generation in progress

CryptoWorld News: The RAEv2 open-source project was jointly launched by institutions including Adobe Research, the Australian National University (ANU), and the Xie Saining team at New York University (NYU), among others. It improves convergence speed by 10 times, surpassing the previous record of 800 rounds with just 80 training rounds. As an image reconstruction scheme based on diffusion models to replace traditional variational autoencoders (VAE), the new version addresses pain points such as poor reconstruction quality in the first generation, an inability to use standard classifier-free guidance (CFG), and extremely slow convergence. On ImageNet, it achieves a global FID (GFID) score of 1.06 with only 80 training rounds. In its architectural design, the research team delivered three core optimizations: adopting a multi-layer representation scheme that directly sums the outputs of the last K layers of the encoder, preserving the structure of the underlying subspace. The new architecture also clarifies the complementary mechanism between the representation autoencoder and representation alignment (REPA), enabling stronger performance on generative tasks. Tests show that to reach a GFID below 2, the first-generation model requires 177 rounds, while the new architecture needs only 35 rounds.

View Original

This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.

11 Likes

Reward
11
10
Repost
Share

Comment

Add a comment

PaperSculptureSquidward

· 5h ago

How do REPA and multi-layer representations work together? Wait for the paper to analyze in detail.

View OriginalReply0

GateUser-b6d80ba0

· 22h ago

Adding the last K layers of the encoder together has a bit of a ResNet skip connection feel, but applied in the latent space

View OriginalReply0

NeonVortexInTheSmog

· 22h ago

Diffusion reconstruction + CFG compatibility, clearing technical debt in one go

View OriginalReply0

CyberBridgeDeepPerspective

· 22h ago

Round 35 GFID<2, this efficiency makes the alchemist ecstatic

View OriginalReply0

RevokingPermissionsOnARainy

· 22h ago

Someone finally took the issue of VAE reconstruction blurriness seriously, tearing up.

View OriginalReply0

HoldingPositionsIsLikeTending

· 22h ago

Adobe + ANU + NYU three partners join forces, maximizing resources

View OriginalReply0

CandleAfterTheRain

· 22h ago

The multi-layer representation preserves the underlying structure; this design is very detailed and not just a simple stacking depth.

View OriginalReply0

BitByBitBenny

· 22h ago

GFID 1.06 only 80 rounds, the previous generation 177 rounds was cut off halfway, convergence speed skyrocketing

View OriginalReply0

GateUser-0f8d377b

· 22h ago

Xie Saining's team has connected reconstruction and generation this time; the REPA complementary mechanism has some substance.

View OriginalReply0

Salt-BakedSentimentChart

· 22h ago

Using diffusion models as VAE is indeed a wild idea.

View OriginalReply0

Trending Topics
View More
#
TradfiTradingChallenge
293.03K Popularity
#
PlatinumCardCreatorExclusive
89.77K Popularity
#
DailyPolymarketHotspot
1.04M Popularity
#
GateSquarePizzaDay
1.76M Popularity
#
SpaceXOfficiallyFilesforIPO
560.67K Popularity

Pinned

Sitemap

RAEv2 Open Source: Convergence Speed Increased by 10 Times, 80 Training Epochs Surpass the Previous 800 Epoch Record

Trending Topics

TradfiTradingChallenge

PlatinumCardCreatorExclusive

DailyPolymarketHotspot

GateSquarePizzaDay

SpaceXOfficiallyFilesforIPO

Pinned