add-readme

Browse files

Files changed (12) hide show

.gitattributes +10 -0
assets/vader_method.png +3 -0
assets/videos/1.gif +3 -0
assets/videos/10.gif +3 -0
assets/videos/11.gif +3 -0
assets/videos/3.gif +3 -0
assets/videos/4.gif +3 -0
assets/videos/5.gif +3 -0
assets/videos/7.gif +3 -0
assets/videos/8.gif +3 -0
assets/videos/9.gif +3 -0
readme.md +35 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,13 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/vader_method.png filter=lfs diff=lfs merge=lfs -text
+assets/videos/1.gif filter=lfs diff=lfs merge=lfs -text
+assets/videos/10.gif filter=lfs diff=lfs merge=lfs -text
+assets/videos/11.gif filter=lfs diff=lfs merge=lfs -text
+assets/videos/3.gif filter=lfs diff=lfs merge=lfs -text
+assets/videos/4.gif filter=lfs diff=lfs merge=lfs -text
+assets/videos/5.gif filter=lfs diff=lfs merge=lfs -text
+assets/videos/7.gif filter=lfs diff=lfs merge=lfs -text
+assets/videos/8.gif filter=lfs diff=lfs merge=lfs -text
+assets/videos/9.gif filter=lfs diff=lfs merge=lfs -text

assets/vader_method.png ADDED Viewed

Git LFS Details

SHA256: cb64bc848a05b5a3ac7e4d1a23233dee88ce41076798a72e32b7d4b5e05632d5
Pointer size: 132 Bytes
Size of remote file: 1.84 MB

assets/videos/1.gif ADDED Viewed

Git LFS Details

SHA256: 1a80891247519f5760f04cc28723a88f2f52423bf10f4bb5f47d681b4431bb81
Pointer size: 132 Bytes
Size of remote file: 3.38 MB

assets/videos/10.gif ADDED Viewed

Git LFS Details

SHA256: 8f1fbeefd8047f48933a8da83fa102355516f3937f803074dd5d1e15c443744e
Pointer size: 132 Bytes
Size of remote file: 2.89 MB

assets/videos/11.gif ADDED Viewed

Git LFS Details

SHA256: beb9e57164f877a84151f8e6f5baf29f20a7d960c99bbdee8160d2af866c0088
Pointer size: 132 Bytes
Size of remote file: 3.55 MB

assets/videos/3.gif ADDED Viewed

Git LFS Details

SHA256: 06a4ebc49e38a79a0fa00587053dc8f427b62351cbd53c88a2df799514b1b510
Pointer size: 132 Bytes
Size of remote file: 2.99 MB

assets/videos/4.gif ADDED Viewed

Git LFS Details

SHA256: fe8be54726fd10522314acd14e3b5163dd58d8c746d9fd8668c3c3220d65ea74
Pointer size: 132 Bytes
Size of remote file: 2.71 MB

assets/videos/5.gif ADDED Viewed

Git LFS Details

SHA256: 48259913fc680327cde5724862a84aade14f6275dc2134d64a8695821e49db6b
Pointer size: 132 Bytes
Size of remote file: 3.54 MB

assets/videos/7.gif ADDED Viewed

Git LFS Details

SHA256: e9e36be4dbf67c5584a975669ac6602553a146b3d2adde6c82c5a36f32e8206c
Pointer size: 132 Bytes
Size of remote file: 3.49 MB

assets/videos/8.gif ADDED Viewed

Git LFS Details

SHA256: db25162e815fe1eff74189784837c8772692373d7e9915a8f59d68f4cf16dffa
Pointer size: 132 Bytes
Size of remote file: 3.57 MB

assets/videos/9.gif ADDED Viewed

Git LFS Details

SHA256: 67036f799467706e3b6036c639effe40638029a748a5549a35ed77175dbc3846
Pointer size: 132 Bytes
Size of remote file: 3.56 MB

readme.md ADDED Viewed

	@@ -0,0 +1,35 @@

+<div align="center">
+<!-- TITLE -->
+# **Video Diffusion Alignment via Reward Gradient**
+![VADER](assets/vader_method.png)
+[![arXiv](https://img.shields.io/badge/cs.LG-)]()
+[![Website](https://img.shields.io/badge/🌎-Website-blue.svg)](http://vader-vid.github.io)
+[![GitHub](https://img.shields.io/github/stars/mihirp1998/VADER?style=social)](https://github.com/mihirp1998/VADER)
+</div>
+This is the official implementation of our paper [Video Diffusion Alignment via Reward Gradient](https://vader-vid.github.io/) by
+Mihir Prabhudesai*, Russell Mendonca*, Zheyang Qin*, Katerina Fragkiadaki, Deepak Pathak.
+<!-- DESCRIPTION -->
+## Abstract
+We have made significant progress towards building foundational video diffusion models. As these models are trained using large-scale unsupervised data, it has become crucial to adapt these models to specific downstream tasks, such as video-text alignment or ethical video generation. Adapting these models via supervised fine-tuning requires collecting target datasets of videos, which is challenging and tedious. In this work, we instead utilize pre-trained reward models that are learned via preferences on top of powerful discriminative models. These models contain dense gradient information with respect to generated RGB pixels, which is critical to be able to learn efficiently in complex search spaces, such as videos. We show that our approach can enable alignment of video diffusion for aesthetic generations, similarity between text context and video, as well long horizon video generations that are 3X longer than the training sequence length. We show our approach can learn much more efficiently in terms of reward queries and compute than previous gradient-free approaches for video generation.
+## Demo
+|         |          |       |
+| ----------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| <img src="assets/videos/8.gif" width=""> | <img src="assets/videos/5.gif" width=""> | <img src="assets/videos/7.gif" width=""> |
+| <img src="assets/videos/10.gif" width=""> | <img src="assets/videos/3.gif" width=""> | <img src="assets/videos/4.gif" width=""> |
+| <img src="assets/videos/9.gif" width=""> | <img src="assets/videos/1.gif" width=""> | <img src="assets/videos/11.gif" width=""> |
+## Citation
+If you find this work useful in your research, please cite:
+```bibtex
+```