File size: 2,183 Bytes
7821764
 
 
 
 
ee20d3b
 
7821764
ee20d3b
 
 
 
7821764
ee20d3b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b18c99b
 
 
 
 
 
 
 
 
ee20d3b
 
 
 
 
 
 
 
 
 
 
7821764
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
tags:
- autoencoder
- phenaki
---
# Phenaki CViViT - Obvious Research

![](assets_readme/obvious_research.png)

# Reproduction of the first step in the [text-to-video model Phenaki](https://arxiv.org/pdf/2210.02399.pdf).
## Code and model weights for the Transformer-based autoencoder for videos called CViViT.

![](assets_readme/phenaki.png)

## * Code, based on lucidrains' repo

The code is heavily based [on the reproduction of Phenaki](https://github.com/lucidrains/phenaki-pytorch) by the one and only [lucidrains](https://github.com/lucidrains). However, for actually training the model we had to make several modifications. Here's the list of modifications compared to the original repo:

- added i3d video loss
- loss weights, architecture parameters, optimizer parameters closer to paper
- added learning rate schedulers (warmup + annealing)
- added webdataset integration
- added video data preprocessing (8fps, 11 frames per videos as in the paper)
- added vq L2 factorized codes (once again thanks to lucidrains)
- code is now compatible for multi GPU and multi node training
- added accelerate wandb integration
- added visualisation scripts
- minor bug fixes

## * Model weight release, on Huggingface

We release the model weights of our best training. The model is trained on the Webvid-10M dataset on a multi-node multi-gpu setup. 

As the model CViViT is an autoencoder for videos, here are examples of videos and reconstructions created by the model:

With our logo at Obvious:

![](assets_readme/obvious_example.gif)


With the famous blue/red pill from Matrix:

![](assets_readme/matrix_example.gif)

## * Next steps

We are working on the second part of training of Phenaki, which actually yields the full text-to-video model. 

We appreciate any help, feel free to reach out! You can contact us:

- On Twitter: [@obv_research](https://twitter.com/obv_research)
- By mail: research.obvious@gmail.com

## * About Obvious Research

Obvious Research is an Artificial Intelligence research laboratory dedicated to creating new AI artistic tools, initiated by the artists’ trio [Obvious](https://obvious-art.com/), in partnership with La Sorbonne Université.