File size: 3,086 Bytes

ad3943a
4805ed7
ad3943a
 
 
e0c33be
ad3943a
 
51b7db3
 
 
4805ed7
 
2130869
d72b1e4
 
026c1f8
2202018
2b31b89
026c1f8
 
 
 
 
 
 
 
2202018
2b31b89
026c1f8
d72b1e4
 
 
 
 
 
 
 
 
4155047
d72b1e4
 
 
 
 
 
 
 
 
7b598e9
d72b1e4
2202018
 
d72b1e4
7d55411
d72b1e4
0025067
7d7aade
d72b1e4

---
thumbnail: https://user-images.githubusercontent.com/54370274/243292723-fa703668-a931-41e1-8bcf-19c72203980b.png
tags:
- TextTovideo
- Text2Video
- text-to-video 
---

🐣 Please follow me for new updates https://twitter.com/camenduru <br />
🔥 Please join our discord server https://discord.gg/k5BwmmvJJU

![00041-3056174990](https://github.com/camenduru/Text-To-Video-Finetuning-colab/assets/54370274/fa703668-a931-41e1-8bcf-19c72203980b)

# Potat 1️⃣ 
First Open-Source 1024x576 Text To Video Model 🥳  

https://huggingface.co/vdo/potat1-5000/tree/main <br />
https://huggingface.co/vdo/potat1-10000/tree/main <br />
https://huggingface.co/vdo/potat1-10000-base-text-encoder/tree/main <br />
https://huggingface.co/vdo/potat1-15000/tree/main <br />
https://huggingface.co/vdo/potat1-20000/tree/main <br />
https://huggingface.co/vdo/potat1-25000/tree/main <br />
https://huggingface.co/vdo/potat1-30000/tree/main <br />
https://huggingface.co/vdo/potat1-35000/tree/main <br />
https://huggingface.co/vdo/potat1-40000/tree/main <br />
https://huggingface.co/vdo/potat1-45000/tree/main <br /> 
https://huggingface.co/vdo/potat1-50000/tree/main <br />
https://huggingface.co/vdo/potat1-50000-base-text-encoder/tree/main = https://huggingface.co/camenduru/potat1 (you are here) <br />


### Info
Prototype Model <br />
Trained with https://lambdalabs.com ❤ 1xA100 (40GB) <br />
2197 clips, 68388 tagged frames ( [salesforce/blip2-opt-6.7b-coco](https://huggingface.co/Salesforce/blip2-opt-6.7b-coco) ) <br />
train_steps: 10000 <br />

### Dataset & Config
https://huggingface.co/camenduru/potat1_dataset/tree/main

### Finetuning
https://github.com/Breakthrough/PySceneDetect <br />
https://github.com/ExponentialML/Video-BLIP2-Preprocessor <br />
https://github.com/ExponentialML/Text-To-Video-Finetuning <br />
https://github.com/camenduru/Text-To-Video-Finetuning-colab <br />

### Base Model
https://huggingface.co/damo-vilab/modelscope-damo-text-to-video-synthesis <br />
https://www.modelscope.cn/models/damo/text-to-video-synthesis <br />

Thanks to [damo-vilab](https://damo.alibaba.com/) ❤ [ExponentialML](https://github.com/ExponentialML) ❤ [kabachuha](https://github.com/kabachuha) ❤ [@DiffusersLib](https://twitter.com/DiffusersLib) ❤ [@LambdaAPI](https://twitter.com/LambdaAPI) ❤ [@cerspense](https://twitter.com/cerspense) ❤ [@CiaraRowles1](https://twitter.com/CiaraRowles1) ❤ [@p1atdev_art](https://twitter.com/p1atdev_art)  ❤ <br />

Thanks to Orellius ❤ (important bug report) <br />

Please try it 🐣 <br />
https://github.com/camenduru/text-to-video-synthesis-colab <br />

<video src="https://github-production-user-asset-6210df.s3.amazonaws.com/54370274/244223223-c5201c8a-2815-4533-9474-1e312c564f4e.mp4" data-canonical-src="https://github-production-user-asset-6210df.s3.amazonaws.com/54370274/244223223-c5201c8a-2815-4533-9474-1e312c564f4e.mp4" controls="controls" muted="muted" class="d-block rounded-bottom-2 border-top width-fit" style="max-height:640px; min-height: 200px"></video>

Potat 2️⃣ is in the oven ♨ <br />