Release Release v1.2.0 · PKU-YuanGroup/Open-Sora-Plan

v1.2.0 is here! Utilizing a 3D full attention architecture instead of 2+1D. We released a true 3D video diffusion model trained on 4s 720p.

Architecture shift from 2+1D model to 3D full attention architecture and no longer supports 2+1D.
Instead of joint image-video training, the image weights are trained first as the initialization for the video.
Release all data annotations, the data are filtered by aesthetic and motion.
Improve CasualVideoVAE performance and report performance on validation set of WebVid and Panda70M.

Although the 3D attention architecture excels in spatio-temporal consistency, it is so expensive to train that it is difficult to scale up. We hope to collaborate with the open-source community to optimize the 3D DiT architecture. For further details, please refer to our report.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release v1.2.0