I just saw that Intel has made available on Hugging Face the INT4 quantized versions of Alibaba's Wan 2.2 video models. This is quite interesting from a model optimization perspective.



Basically, Intel managed to significantly reduce the size of each model's weights. Each weight that occupied 2 bytes in BF16 now takes up only 0.5 bytes after INT4 quantization. This means the total size drops to approximately a quarter of the original. The tool used was AutoRound.

The three released models are T2V-A14B (text-to-video), I2V-A14B (image-to-video), and TI2V-5B (hybrid text-and-image input). The original A14B models run on MoE architecture with a total of 27 billion parameters, with 14 billion active per step. Without INT4 quantization, they require at least 80GB of VRAM per GPU just to work with 720p resolution.

The most practical is the TI2V-5B, a dense model capable of running 720p at 24fps on a 4090 GPU even in its original form. Imagine with INT4 optimization applied.

An important detail is that Intel has not yet released comprehensive benchmarks on VRAM consumption or visual quality after INT4 quantization. It will depend on third-party verification. For those wanting to test, Intel points to the vllm-omni branch as the deployment option, since these models do not run on the main vLLM pipeline.

This kind of optimization makes these video models much more accessible to those without high-end hardware.
View Original
This page may contain third-party content, which is provided for information purposes only (not representations/warranties) and should not be considered as an endorsement of its views by Gate, nor as financial or professional advice. See Disclaimer for details.
  • Reward
  • Comment
  • Repost
  • Share
Comment
Add a comment
Add a comment
No comments
  • Pin