image generation | Shreyansh Singh

Paper Summary #11 - Sora

Technical Paper: Sora - Creating video from text Blog: Video generation models as world simulators These are just short notes / excerpts from the technical paper for quick lookup. Sora is quite a breakthrough. It is able to understand and simulate the physical world, generating upto 60s long high-definition videos while maintaining the quality, scene continuation and following the user’s prompt. Key papers Sora is built upon - Diffusion Transformer (DiT) Latent Diffusion Models DALL-E 3 Image Recaptioning Sora (being a diffusion transformer) uses the idea from ViT of using patches.