Spaces:
Runtime error
Runtime error
Update README.md
Browse files
README.md
CHANGED
@@ -1,12 +1,83 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
|
|
|
|
|
|
|
|
11 |
|
12 |
-
|
|
|
|
|
|
1 |
+
## VideoCraftXtend: AI-Enhanced Text-to-Video Generation with Extended Length and Enhanced Motion Smoothness
|
2 |
+
|
3 |
+
<a href='https://huggingface.co/spaces/ychenhq/VideoCrafterXtend'><img src='https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Model-blue'></a>
|
4 |
+
|
5 |
+
------
|
6 |
+
|
7 |
+
## Introduction
|
8 |
+
VideoCraftXtend is an open-source video generation and editing toolbox for crafting video content.
|
9 |
+
This project aims to tackle challenges in T2V generation, specifically focusing on the production of long videos, enhancing motion smoothness quality and improving content diversity. We propose a comprehensive framework that integrates a T2V diffusion model, utilizes the OpenAI GPT API, incorporates a Video Quality Assessment (VQA) model, and refines an Interpolation model.
|
10 |
+
|
11 |
+
### 1. Generic Text-to-video Generation
|
12 |
+
Click the GIF to access the high-resolution video.
|
13 |
+
|
14 |
+
<table class="center">
|
15 |
+
<td>
|
16 |
+
<video width="320" controls>
|
17 |
+
<source src="https://github.com/chloeleehn/VideoCraftXtend/blob/main/VideoCrafter/results/cat/0001.mp4" type="video/mp4">
|
18 |
+
Your browser does not support the video tag.
|
19 |
+
</video>
|
20 |
+
</td>
|
21 |
+
<td>
|
22 |
+
<video width="320" controls>
|
23 |
+
<source src="https://github.com/chloeleehn/VideoCraftXtend/blob/main/VideoCrafter/results/cat/0002.mp4" type="video/mp4">
|
24 |
+
Your browser does not support the video tag.
|
25 |
+
</video>
|
26 |
+
</td>
|
27 |
+
<td>
|
28 |
+
<video width="320" controls>
|
29 |
+
<source src="https://github.com/chloeleehn/VideoCraftXtend/blob/main/VideoCrafter/results/cat/0003.mp4" type="video/mp4">
|
30 |
+
Your browser does not support the video tag.
|
31 |
+
</video>
|
32 |
+
</td>
|
33 |
+
<tr>
|
34 |
+
<td style="text-align:center;" width="320">"There is a cat dancing on the sand."</td>
|
35 |
+
<td style="text-align:center;" width="320">"Behold the mesmerizing sight of a cat elegantly dancing amidst the soft grains of sand."</td>
|
36 |
+
<td style="text-align:center;" width="320">"The fluffy cat is joyfully prancing and twirling on the soft golden sand, its elegant movements mirroring the peaceful seaside setting."</td>
|
37 |
+
<tr>
|
38 |
+
</table >
|
39 |
+
|
40 |
+
|
41 |
+
## βοΈ Setup
|
42 |
+
|
43 |
+
### 1. Install Environment
|
44 |
+
1) Via Anaconda
|
45 |
+
```bash
|
46 |
+
conda create -n videocraftxtend python=3.8.5
|
47 |
+
conda activate videocraftxtend
|
48 |
+
pip install -r requirements.txt
|
49 |
+
```
|
50 |
+
2) Using Google Colab Pro
|
51 |
+
|
52 |
+
### 2. Download the model checkpoints
|
53 |
+
1) Download pretrained T2V models via [Hugging Face](https://huggingface.co/VideoCrafter/VideoCrafter2/blob/main/model.ckpt), and put the `model.ckpt` in `VideoCrafter/checkpoints/base_512_v2/model.ckpt`.
|
54 |
+
2) Download pretrained Interpolation models viea [Google Drive](https://drive.google.com/drive/folders/1TBEwF2PmSGyDngP1anjNswlIfwGh2NzU?usp=sharing), and put the `flownet.pkl` in `VideoCrafter/ECCV2022-RIFE/train_log/flownet.pkl`.
|
55 |
+
|
56 |
+
## π« Inference
|
57 |
+
### 1. Text-to-Video local Gradio demo
|
58 |
+
1) Open `VideoCraftXtend.ipynb`, run the cells till generating Gradio Interface.
|
59 |
+
2) Input prompt, customize the parameters and get the resulting video
|
60 |
+
3) The last section of the file is evaluation results been put in our report)
|
61 |
+
4) Open the `VideoCraftXtend.ipynb` notebook and run the cells until you reach the point where the Gradio interface is generated.
|
62 |
+
5) Once the Gradio interface is generated, you can input prompts and customize the parameters according to your requirements. The resulting video should be generated within an estimated timeframe of 15-20 minutes.
|
63 |
+
6) The last section of `VideoCraftXtend.ipynb` contains the evaluation results that were included in our report.
|
64 |
+
|
65 |
+
|
66 |
---
|
67 |
+
## π Techinical Report
|
68 |
+
π VideoCrafter2 Tech report: [VideoCrafter2: Overcoming Data Limitations for High-Quality Video Diffusion Models](https://arxiv.org/abs/2401.09047)
|
69 |
+
|
70 |
+
|
71 |
+
## π€ Acknowledgements
|
72 |
+
Our codebase builds on
|
73 |
+
1) [Stable Diffusion](https://github.com/Stability-AI/stablediffusion)
|
74 |
+
2) [VideoCrafter2](https://github.com/AILab-CVC/VideoCrafter)
|
75 |
+
3) [UVQ](https://github.com/google/uvq)
|
76 |
+
4) [VBench](https://github.com/Vchitect/VBench)
|
77 |
+
5) [RIFE](https://github.com/hzwer/ECCV2022-RIFE)
|
78 |
+
Thanks the authors for sharing their codebases!
|
79 |
+
|
80 |
|
81 |
+
## π’ Disclaimer
|
82 |
+
We develop this repository for RESEARCH purposes, so it can only be used for personal/research/non-commercial purposes.
|
83 |
+
****
|