obvious-research commited on
Commit
ee20d3b
1 Parent(s): e32490b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +48 -3
README.md CHANGED
@@ -1,3 +1,48 @@
1
- ---
2
- license: wtfpl
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Phenaki CViViT - Obvious Research
2
+
3
+ <p align="center">
4
+ <img src="assets_readme/obvious_research.png" alt="obvious_research" width="600"/>
5
+ </p>
6
+
7
+ # Reproduction of the first step in the [text-to-video model Phenaki](https://arxiv.org/pdf/2210.02399.pdf).
8
+ ## Code and model weights for the Transformer-based autoencoder for videos called CViViT.
9
+
10
+ <p align="center">
11
+ <img src="assets_readme/phenaki.png" alt="phenaki" width="600"/>
12
+ </p>
13
+
14
+ ## * Code, based on lucidrains' repo
15
+
16
+ The code is heavily based [on the reproduction of Phenaki](https://github.com/lucidrains/phenaki-pytorch) by the one and only [lucidrains](https://github.com/lucidrains). However, for actually training the model we had to make several modifications. Here's the list of modifications compared to the original repo:
17
+
18
+ - added i3d video loss
19
+ - loss weights, architecture parameters, optimizer parameters closer to paper
20
+ - added learning rate schedulers (warmup + annealing)
21
+ - added webdataset integration
22
+ - added video data preprocessing (8fps, 11 frames per videos as in the paper)
23
+ - added vq L2 factorized codes (once again thanks to lucidrains)
24
+ - code is now compatible for multi GPU and multi node training
25
+ - added accelerate wandb integration
26
+ - added visualisation scripts
27
+ - minor bug fixes
28
+
29
+ ## * Model weight release, on Huggingface
30
+
31
+ We release the model weights of our best training. The model is trained on the Webvid-10M dataset on a multi-node multi-gpu setup.
32
+
33
+ As the model CViViT is an autoencoder for videos, here are examples of videos and reconstructions created by the model:
34
+
35
+ ## * Next steps
36
+
37
+ We are working on the second part of training of Phenaki, which actually yields the full text-to-video model.
38
+
39
+ We appreciate any help, feel free to reach out! You can contact us:
40
+
41
+ - On Twitter: [@obv_research](https://twitter.com/obv_research)
42
+ - By mail: research.obvious@gmail.com
43
+
44
+ ## * About Obvious Research
45
+
46
+ Obvious Research is an Artificial Intelligence research laboratory dedicated to creating new AI artistic tools, initiated by the artists’ trio [Obvious](https://obvious-art.com/), in partnership with La Sorbonne Université.
47
+
48
+