Spaces:
Runtime error
Runtime error
ShaoTengLiu
commited on
Commit
•
69d3d9d
1
Parent(s):
aa1f936
debug
Browse filesThis view is limited to 50 files because it contains too many changes.
See raw diff
- Video-P2P-Beta +0 -1
- Video-P2P/.DS_Store +0 -0
- Video-P2P/.gitignore +3 -0
- Video-P2P/README.md +99 -0
- Video-P2P/configs/.DS_Store +0 -0
- Video-P2P/configs/bird-forest-p2p.yaml +17 -0
- Video-P2P/configs/bird-forest-tune.yaml +38 -0
- Video-P2P/configs/car-drive-p2p.yaml +16 -0
- Video-P2P/configs/car-drive-tune.yaml +38 -0
- Video-P2P/configs/man-motor-p2p.yaml +16 -0
- Video-P2P/configs/man-motor-tune.yaml +38 -0
- Video-P2P/configs/man-surfing-tune.yaml +38 -0
- Video-P2P/configs/penguin-run-p2p.yaml +16 -0
- Video-P2P/configs/penguin-run-tune.yaml +38 -0
- Video-P2P/configs/rabbit-jump-p2p.yaml +16 -0
- Video-P2P/configs/rabbit-jump-tune.yaml +38 -0
- Video-P2P/configs/tiger-forest-p2p.yaml +16 -0
- Video-P2P/configs/tiger-forest-tune.yaml +38 -0
- Video-P2P/data/.DS_Store +0 -0
- Video-P2P/data/car/.DS_Store +0 -0
- Video-P2P/data/car/1.jpg +0 -0
- Video-P2P/data/car/2.jpg +0 -0
- Video-P2P/data/car/3.jpg +0 -0
- Video-P2P/data/car/4.jpg +0 -0
- Video-P2P/data/car/5.jpg +0 -0
- Video-P2P/data/car/6.jpg +0 -0
- Video-P2P/data/car/7.jpg +0 -0
- Video-P2P/data/car/8.jpg +0 -0
- Video-P2P/data/motorbike/1.jpg +0 -0
- Video-P2P/data/motorbike/2.jpg +0 -0
- Video-P2P/data/motorbike/3.jpg +0 -0
- Video-P2P/data/motorbike/4.jpg +0 -0
- Video-P2P/data/motorbike/5.jpg +0 -0
- Video-P2P/data/motorbike/6.jpg +0 -0
- Video-P2P/data/motorbike/7.jpg +0 -0
- Video-P2P/data/motorbike/8.jpg +0 -0
- Video-P2P/data/penguin_ice/1.jpg +0 -0
- Video-P2P/data/penguin_ice/2.jpg +0 -0
- Video-P2P/data/penguin_ice/3.jpg +0 -0
- Video-P2P/data/penguin_ice/4.jpg +0 -0
- Video-P2P/data/penguin_ice/5.jpg +0 -0
- Video-P2P/data/penguin_ice/6.jpg +0 -0
- Video-P2P/data/penguin_ice/7.jpg +0 -0
- Video-P2P/data/penguin_ice/8.jpg +0 -0
- Video-P2P/data/rabbit/1.jpg +0 -0
- Video-P2P/data/rabbit/2.jpg +0 -0
- Video-P2P/data/rabbit/3.jpg +0 -0
- Video-P2P/data/rabbit/4.jpg +0 -0
- Video-P2P/data/rabbit/5.jpg +0 -0
- Video-P2P/data/rabbit/6.jpg +0 -0
Video-P2P-Beta
DELETED
@@ -1 +0,0 @@
|
|
1 |
-
Subproject commit 7a8fa7a8b8d81bbba367865f47b7894cdc4efafb
|
|
|
|
Video-P2P/.DS_Store
ADDED
Binary file (6.15 kB). View file
|
|
Video-P2P/.gitignore
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
*.pyc
|
2 |
+
*.pt
|
3 |
+
*.gif
|
Video-P2P/README.md
ADDED
@@ -0,0 +1,99 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Video-P2P: Video Editing with Cross-attention Control
|
2 |
+
The official implementation of [Video-P2P](https://video-p2p.github.io/).
|
3 |
+
|
4 |
+
[Shaoteng Liu](https://www.shaotengliu.com/), [Yuechen Zhang](https://julianjuaner.github.io/), [Wenbo Li](https://fenglinglwb.github.io/), [Zhe Lin](https://sites.google.com/site/zhelin625/), [Jiaya Jia](https://jiaya.me/)
|
5 |
+
|
6 |
+
[![Project Website](https://img.shields.io/badge/Project-Website-orange)](https://video-p2p.github.io/)
|
7 |
+
[![arXiv](https://img.shields.io/badge/arXiv-2303.04761-b31b1b.svg)](https://arxiv.org/abs/2303.04761)
|
8 |
+
|
9 |
+
![Teaser](./docs/teaser.png)
|
10 |
+
|
11 |
+
## Changelog
|
12 |
+
|
13 |
+
- 2023.03.20 Release Gradio Demo.
|
14 |
+
- 2023.03.19 Release Code.
|
15 |
+
- 2023.03.09 Paper preprint on arxiv.
|
16 |
+
|
17 |
+
## Todo
|
18 |
+
|
19 |
+
- [x] Release the code with 6 examples.
|
20 |
+
- [x] Update a faster version.
|
21 |
+
- [x] Release all data.
|
22 |
+
- [ ] Release the Gradio Demo.
|
23 |
+
- [ ] Release more configs and new applications.
|
24 |
+
|
25 |
+
## Setup
|
26 |
+
|
27 |
+
``` bash
|
28 |
+
pip install -r requirements.txt
|
29 |
+
```
|
30 |
+
|
31 |
+
The code was tested on both Tesla V100 32GB and RTX3090 24GB.
|
32 |
+
|
33 |
+
The environment is similar to [Tune-A-video](https://github.com/showlab/Tune-A-Video) and [prompt-to-prompt](https://github.com/google/prompt-to-prompt/).
|
34 |
+
|
35 |
+
[xformers](https://github.com/facebookresearch/xformers) on 3090 may meet this [issue](https://github.com/bryandlee/Tune-A-Video/issues/4).
|
36 |
+
|
37 |
+
## Quickstart
|
38 |
+
|
39 |
+
Please replace ``pretrained_model_path'' with the path to your stable-diffusion.
|
40 |
+
|
41 |
+
``` bash
|
42 |
+
# You can minimize the tuning epochs to speed up.
|
43 |
+
python run_tuning.py --config="configs/rabbit-jump-tune.yaml" # Tuning to do model initialization.
|
44 |
+
|
45 |
+
# We develop a faster mode (1 min on V100):
|
46 |
+
python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml" --fast
|
47 |
+
|
48 |
+
# The official mode (10 mins on V100, more stable):
|
49 |
+
python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml"
|
50 |
+
```
|
51 |
+
|
52 |
+
## Dataset
|
53 |
+
|
54 |
+
We release our dataset [here]().
|
55 |
+
Download them under ./data and explore your creativity!
|
56 |
+
|
57 |
+
## Results
|
58 |
+
|
59 |
+
<table class="center">
|
60 |
+
<tr>
|
61 |
+
<td width=50% style="text-align:center;">configs/rabbit-jump-p2p.yaml</td>
|
62 |
+
<td width=50% style="text-align:center;">configs/penguin-run-p2p.yaml</td>
|
63 |
+
</tr>
|
64 |
+
<tr>
|
65 |
+
<td><img src="https://video-p2p.github.io/assets/rabbit.gif"></td>
|
66 |
+
<td><img src="https://video-p2p.github.io/assets/penguin-crochet.gif"></td>
|
67 |
+
</tr>
|
68 |
+
<tr>
|
69 |
+
<td width=50% style="text-align:center;">configs/man-motor-p2p.yaml</td>
|
70 |
+
<td width=50% style="text-align:center;">configs/car-drive-p2p.yaml</td>
|
71 |
+
</tr>
|
72 |
+
<tr>
|
73 |
+
<td><img src="https://video-p2p.github.io/assets/motor.gif"></td>
|
74 |
+
<td><img src="https://video-p2p.github.io/assets/car.gif"></td>
|
75 |
+
</tr>
|
76 |
+
<tr>
|
77 |
+
<td width=50% style="text-align:center;">configs/tiger-forest-p2p.yaml</td>
|
78 |
+
<td width=50% style="text-align:center;">configs/bird-forest-p2p.yaml</td>
|
79 |
+
</tr>
|
80 |
+
<tr>
|
81 |
+
<td><img src="https://video-p2p.github.io/assets/tiger.gif"></td>
|
82 |
+
<td><img src="https://video-p2p.github.io/assets/bird-child.gif"></td>
|
83 |
+
</tr>
|
84 |
+
</table>
|
85 |
+
|
86 |
+
## Citation
|
87 |
+
```
|
88 |
+
@misc{liu2023videop2p,
|
89 |
+
author={Liu, Shaoteng and Zhang, Yuechen and Li, Wenbo and Lin, Zhe and Jia, Jiaya},
|
90 |
+
title={Video-P2P: Video Editing with Cross-attention Control},
|
91 |
+
journal={arXiv:2303.04761},
|
92 |
+
year={2023},
|
93 |
+
}
|
94 |
+
```
|
95 |
+
|
96 |
+
## References
|
97 |
+
* prompt-to-prompt: https://github.com/google/prompt-to-prompt
|
98 |
+
* Tune-A-Video: https://github.com/showlab/Tune-A-Video
|
99 |
+
* diffusers: https://github.com/huggingface/diffusers
|
Video-P2P/configs/.DS_Store
ADDED
Binary file (6.15 kB). View file
|
|
Video-P2P/configs/bird-forest-p2p.yaml
ADDED
@@ -0,0 +1,17 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "./outputs/bird-forest"
|
2 |
+
image_path: "./data/bird_forest"
|
3 |
+
prompt: "a bird flying in the forest"
|
4 |
+
prompts:
|
5 |
+
- "a bird flying in the forest"
|
6 |
+
- "children drawing of a bird flying in the forest"
|
7 |
+
eq_params:
|
8 |
+
words:
|
9 |
+
- "children"
|
10 |
+
- "drawing"
|
11 |
+
values:
|
12 |
+
- 5
|
13 |
+
- 2
|
14 |
+
save_name: "children"
|
15 |
+
is_word_swap: False
|
16 |
+
cross_replace_steps: 0.8
|
17 |
+
self_replace_steps: 0.7
|
Video-P2P/configs/bird-forest-tune.yaml
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
|
2 |
+
output_dir: "./outputs/bird-forest"
|
3 |
+
|
4 |
+
train_data:
|
5 |
+
video_path: "./data/bird_forest"
|
6 |
+
prompt: "a bird flying in the forest"
|
7 |
+
n_sample_frames: 8
|
8 |
+
width: 512
|
9 |
+
height: 512
|
10 |
+
sample_start_idx: 0
|
11 |
+
sample_frame_rate: 1
|
12 |
+
|
13 |
+
validation_data:
|
14 |
+
prompts:
|
15 |
+
- "a bird flying in the forest"
|
16 |
+
video_length: 8
|
17 |
+
width: 512
|
18 |
+
height: 512
|
19 |
+
num_inference_steps: 50
|
20 |
+
guidance_scale: 12.5
|
21 |
+
use_inv_latent: True
|
22 |
+
num_inv_steps: 50
|
23 |
+
|
24 |
+
learning_rate: 3e-5
|
25 |
+
train_batch_size: 1
|
26 |
+
max_train_steps: 500
|
27 |
+
checkpointing_steps: 1000
|
28 |
+
validation_steps: 600
|
29 |
+
trainable_modules:
|
30 |
+
- "attn1.to_q"
|
31 |
+
- "attn2.to_q"
|
32 |
+
- "attn_temp"
|
33 |
+
|
34 |
+
seed: 33
|
35 |
+
mixed_precision: fp16
|
36 |
+
use_8bit_adam: False
|
37 |
+
gradient_checkpointing: True
|
38 |
+
enable_xformers_memory_efficient_attention: True
|
Video-P2P/configs/car-drive-p2p.yaml
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "./outputs/car-drive"
|
2 |
+
image_path: "./data/car"
|
3 |
+
prompt: "a car is driving on the road"
|
4 |
+
prompts:
|
5 |
+
- "a car is driving on the road"
|
6 |
+
- "a car is driving on the railway"
|
7 |
+
blend_word:
|
8 |
+
- 'road'
|
9 |
+
- 'railway'
|
10 |
+
eq_params:
|
11 |
+
words:
|
12 |
+
- "railway"
|
13 |
+
values:
|
14 |
+
- 2
|
15 |
+
save_name: "railway"
|
16 |
+
is_word_swap: True
|
Video-P2P/configs/car-drive-tune.yaml
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
|
2 |
+
output_dir: "./outputs/car-drive"
|
3 |
+
|
4 |
+
train_data:
|
5 |
+
video_path: "./data/car"
|
6 |
+
prompt: "a car is driving on the road"
|
7 |
+
n_sample_frames: 8
|
8 |
+
width: 512
|
9 |
+
height: 512
|
10 |
+
sample_start_idx: 0
|
11 |
+
sample_frame_rate: 1
|
12 |
+
|
13 |
+
validation_data:
|
14 |
+
prompts:
|
15 |
+
- "a car is driving on the railway"
|
16 |
+
video_length: 8
|
17 |
+
width: 512
|
18 |
+
height: 512
|
19 |
+
num_inference_steps: 50
|
20 |
+
guidance_scale: 12.5
|
21 |
+
use_inv_latent: True
|
22 |
+
num_inv_steps: 50
|
23 |
+
|
24 |
+
learning_rate: 3e-5
|
25 |
+
train_batch_size: 1
|
26 |
+
max_train_steps: 300
|
27 |
+
checkpointing_steps: 1000
|
28 |
+
validation_steps: 300
|
29 |
+
trainable_modules:
|
30 |
+
- "attn1.to_q"
|
31 |
+
- "attn2.to_q"
|
32 |
+
- "attn_temp"
|
33 |
+
|
34 |
+
seed: 33
|
35 |
+
mixed_precision: fp16
|
36 |
+
use_8bit_adam: False
|
37 |
+
gradient_checkpointing: True
|
38 |
+
enable_xformers_memory_efficient_attention: True
|
Video-P2P/configs/man-motor-p2p.yaml
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "./outputs/man-motor"
|
2 |
+
image_path: "./data/motorbike"
|
3 |
+
prompt: "a man is driving a motorbike in the forest"
|
4 |
+
prompts:
|
5 |
+
- "a man is driving a motorbike in the forest"
|
6 |
+
- "a Spider-Man is driving a motorbike in the forest"
|
7 |
+
blend_word:
|
8 |
+
- 'man'
|
9 |
+
- 'Spider-Man'
|
10 |
+
eq_params:
|
11 |
+
words:
|
12 |
+
- "Spider-Man"
|
13 |
+
values:
|
14 |
+
- 4
|
15 |
+
save_name: "spider"
|
16 |
+
is_word_swap: True
|
Video-P2P/configs/man-motor-tune.yaml
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
|
2 |
+
output_dir: "./outputs/man-motor"
|
3 |
+
|
4 |
+
train_data:
|
5 |
+
video_path: "./data/motorbike"
|
6 |
+
prompt: "a man is driving a motorbike in the forest"
|
7 |
+
n_sample_frames: 8
|
8 |
+
width: 512
|
9 |
+
height: 512
|
10 |
+
sample_start_idx: 0
|
11 |
+
sample_frame_rate: 1
|
12 |
+
|
13 |
+
validation_data:
|
14 |
+
prompts:
|
15 |
+
- "a Spider-Man is driving a motorbike in the forest"
|
16 |
+
video_length: 8
|
17 |
+
width: 512
|
18 |
+
height: 512
|
19 |
+
num_inference_steps: 50
|
20 |
+
guidance_scale: 12.5
|
21 |
+
use_inv_latent: True
|
22 |
+
num_inv_steps: 50
|
23 |
+
|
24 |
+
learning_rate: 3e-5
|
25 |
+
train_batch_size: 1
|
26 |
+
max_train_steps: 500
|
27 |
+
checkpointing_steps: 1000
|
28 |
+
validation_steps: 500
|
29 |
+
trainable_modules:
|
30 |
+
- "attn1.to_q"
|
31 |
+
- "attn2.to_q"
|
32 |
+
- "attn_temp"
|
33 |
+
|
34 |
+
seed: 33
|
35 |
+
mixed_precision: fp16
|
36 |
+
use_8bit_adam: False
|
37 |
+
gradient_checkpointing: True
|
38 |
+
enable_xformers_memory_efficient_attention: True
|
Video-P2P/configs/man-surfing-tune.yaml
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "./checkpoints/stable-diffusion-v1-4"
|
2 |
+
output_dir: "./outputs/man-surfing"
|
3 |
+
|
4 |
+
train_data:
|
5 |
+
video_path: "data/man-surfing.mp4"
|
6 |
+
prompt: "a man is surfing"
|
7 |
+
n_sample_frames: 8
|
8 |
+
width: 512
|
9 |
+
height: 512
|
10 |
+
sample_start_idx: 0
|
11 |
+
sample_frame_rate: 1
|
12 |
+
|
13 |
+
validation_data:
|
14 |
+
prompts:
|
15 |
+
- "a panda is surfing"
|
16 |
+
video_length: 8
|
17 |
+
width: 512
|
18 |
+
height: 512
|
19 |
+
num_inference_steps: 50
|
20 |
+
guidance_scale: 12.5
|
21 |
+
use_inv_latent: True
|
22 |
+
num_inv_steps: 50
|
23 |
+
|
24 |
+
learning_rate: 3e-5
|
25 |
+
train_batch_size: 1
|
26 |
+
max_train_steps: 500
|
27 |
+
checkpointing_steps: 1000
|
28 |
+
validation_steps: 500
|
29 |
+
trainable_modules:
|
30 |
+
- "attn1.to_q"
|
31 |
+
- "attn2.to_q"
|
32 |
+
- "attn_temp"
|
33 |
+
|
34 |
+
seed: 33
|
35 |
+
mixed_precision: fp16
|
36 |
+
use_8bit_adam: False
|
37 |
+
gradient_checkpointing: True
|
38 |
+
enable_xformers_memory_efficient_attention: True
|
Video-P2P/configs/penguin-run-p2p.yaml
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "./outputs/penguin-run"
|
2 |
+
image_path: "./data/penguin_ice"
|
3 |
+
prompt: "a penguin is running on the ice"
|
4 |
+
prompts:
|
5 |
+
- "a penguin is running on the ice"
|
6 |
+
- "a crochet penguin is running on the ice"
|
7 |
+
blend_word:
|
8 |
+
- 'penguin'
|
9 |
+
- 'penguin'
|
10 |
+
eq_params:
|
11 |
+
words:
|
12 |
+
- "crochet"
|
13 |
+
values:
|
14 |
+
- 4
|
15 |
+
save_name: "crochet"
|
16 |
+
is_word_swap: False
|
Video-P2P/configs/penguin-run-tune.yaml
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
|
2 |
+
output_dir: "./outputs/penguin-run"
|
3 |
+
|
4 |
+
train_data:
|
5 |
+
video_path: "./data/penguin_ice"
|
6 |
+
prompt: "a penguin is running on the ice"
|
7 |
+
n_sample_frames: 8
|
8 |
+
width: 512
|
9 |
+
height: 512
|
10 |
+
sample_start_idx: 0
|
11 |
+
sample_frame_rate: 1
|
12 |
+
|
13 |
+
validation_data:
|
14 |
+
prompts:
|
15 |
+
- "a crochet penguin is running on the ice"
|
16 |
+
video_length: 8
|
17 |
+
width: 512
|
18 |
+
height: 512
|
19 |
+
num_inference_steps: 50
|
20 |
+
guidance_scale: 12.5
|
21 |
+
use_inv_latent: True
|
22 |
+
num_inv_steps: 50
|
23 |
+
|
24 |
+
learning_rate: 3e-5
|
25 |
+
train_batch_size: 1
|
26 |
+
max_train_steps: 300
|
27 |
+
checkpointing_steps: 1000
|
28 |
+
validation_steps: 300
|
29 |
+
trainable_modules:
|
30 |
+
- "attn1.to_q"
|
31 |
+
- "attn2.to_q"
|
32 |
+
- "attn_temp"
|
33 |
+
|
34 |
+
seed: 33
|
35 |
+
mixed_precision: fp16
|
36 |
+
use_8bit_adam: False
|
37 |
+
gradient_checkpointing: True
|
38 |
+
enable_xformers_memory_efficient_attention: True
|
Video-P2P/configs/rabbit-jump-p2p.yaml
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "./outputs/rabbit-jump"
|
2 |
+
image_path: "./data/rabbit"
|
3 |
+
prompt: "a rabbit is jumping on the grass"
|
4 |
+
prompts:
|
5 |
+
- "a rabbit is jumping on the grass"
|
6 |
+
- "a origami rabbit is jumping on the grass"
|
7 |
+
blend_word:
|
8 |
+
- 'rabbit'
|
9 |
+
- 'rabbit'
|
10 |
+
eq_params:
|
11 |
+
words:
|
12 |
+
- "origami"
|
13 |
+
values:
|
14 |
+
- 2
|
15 |
+
save_name: "origami"
|
16 |
+
is_word_swap: False
|
Video-P2P/configs/rabbit-jump-tune.yaml
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
|
2 |
+
output_dir: "./outputs/rabbit-jump"
|
3 |
+
|
4 |
+
train_data:
|
5 |
+
video_path: "./data/rabbit"
|
6 |
+
prompt: "a rabbit is jumping on the grass"
|
7 |
+
n_sample_frames: 8
|
8 |
+
width: 512
|
9 |
+
height: 512
|
10 |
+
sample_start_idx: 0
|
11 |
+
sample_frame_rate: 1
|
12 |
+
|
13 |
+
validation_data:
|
14 |
+
prompts:
|
15 |
+
- "a origami rabbit is jumping on the grass"
|
16 |
+
video_length: 8
|
17 |
+
width: 512
|
18 |
+
height: 512
|
19 |
+
num_inference_steps: 50
|
20 |
+
guidance_scale: 12.5
|
21 |
+
use_inv_latent: True
|
22 |
+
num_inv_steps: 50
|
23 |
+
|
24 |
+
learning_rate: 3e-5
|
25 |
+
train_batch_size: 1
|
26 |
+
max_train_steps: 500
|
27 |
+
checkpointing_steps: 1000
|
28 |
+
validation_steps: 500
|
29 |
+
trainable_modules:
|
30 |
+
- "attn1.to_q"
|
31 |
+
- "attn2.to_q"
|
32 |
+
- "attn_temp"
|
33 |
+
|
34 |
+
seed: 33
|
35 |
+
mixed_precision: fp16
|
36 |
+
use_8bit_adam: False
|
37 |
+
gradient_checkpointing: True
|
38 |
+
enable_xformers_memory_efficient_attention: True
|
Video-P2P/configs/tiger-forest-p2p.yaml
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "./outputs/tiger-forest"
|
2 |
+
image_path: "./data/tiger"
|
3 |
+
prompt: "a tiger is walking in the forest"
|
4 |
+
prompts:
|
5 |
+
- "a tiger is walking in the forest"
|
6 |
+
- "a Lego tiger is walking in the forest"
|
7 |
+
blend_word:
|
8 |
+
- 'tiger'
|
9 |
+
- 'tiger'
|
10 |
+
eq_params:
|
11 |
+
words:
|
12 |
+
- "Lego"
|
13 |
+
values:
|
14 |
+
- 2
|
15 |
+
save_name: "lego"
|
16 |
+
is_word_swap: False
|
Video-P2P/configs/tiger-forest-tune.yaml
ADDED
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
|
2 |
+
output_dir: "./outputs/tiger-forest"
|
3 |
+
|
4 |
+
train_data:
|
5 |
+
video_path: "./data/tiger"
|
6 |
+
prompt: "a tiger is walking in the forest"
|
7 |
+
n_sample_frames: 8
|
8 |
+
width: 512
|
9 |
+
height: 512
|
10 |
+
sample_start_idx: 0
|
11 |
+
sample_frame_rate: 1
|
12 |
+
|
13 |
+
validation_data:
|
14 |
+
prompts:
|
15 |
+
- "a Lego tiger is walking in the forest"
|
16 |
+
video_length: 8
|
17 |
+
width: 512
|
18 |
+
height: 512
|
19 |
+
num_inference_steps: 50
|
20 |
+
guidance_scale: 12.5
|
21 |
+
use_inv_latent: True
|
22 |
+
num_inv_steps: 50
|
23 |
+
|
24 |
+
learning_rate: 3e-5
|
25 |
+
train_batch_size: 1
|
26 |
+
max_train_steps: 500
|
27 |
+
checkpointing_steps: 1000
|
28 |
+
validation_steps: 500
|
29 |
+
trainable_modules:
|
30 |
+
- "attn1.to_q"
|
31 |
+
- "attn2.to_q"
|
32 |
+
- "attn_temp"
|
33 |
+
|
34 |
+
seed: 33
|
35 |
+
mixed_precision: fp16
|
36 |
+
use_8bit_adam: False
|
37 |
+
gradient_checkpointing: True
|
38 |
+
enable_xformers_memory_efficient_attention: True
|
Video-P2P/data/.DS_Store
ADDED
Binary file (10.2 kB). View file
|
|
Video-P2P/data/car/.DS_Store
ADDED
Binary file (6.15 kB). View file
|
|
Video-P2P/data/car/1.jpg
ADDED
Video-P2P/data/car/2.jpg
ADDED
Video-P2P/data/car/3.jpg
ADDED
Video-P2P/data/car/4.jpg
ADDED
Video-P2P/data/car/5.jpg
ADDED
Video-P2P/data/car/6.jpg
ADDED
Video-P2P/data/car/7.jpg
ADDED
Video-P2P/data/car/8.jpg
ADDED
Video-P2P/data/motorbike/1.jpg
ADDED
Video-P2P/data/motorbike/2.jpg
ADDED
Video-P2P/data/motorbike/3.jpg
ADDED
Video-P2P/data/motorbike/4.jpg
ADDED
Video-P2P/data/motorbike/5.jpg
ADDED
Video-P2P/data/motorbike/6.jpg
ADDED
Video-P2P/data/motorbike/7.jpg
ADDED
Video-P2P/data/motorbike/8.jpg
ADDED
Video-P2P/data/penguin_ice/1.jpg
ADDED
Video-P2P/data/penguin_ice/2.jpg
ADDED
Video-P2P/data/penguin_ice/3.jpg
ADDED
Video-P2P/data/penguin_ice/4.jpg
ADDED
Video-P2P/data/penguin_ice/5.jpg
ADDED
Video-P2P/data/penguin_ice/6.jpg
ADDED
Video-P2P/data/penguin_ice/7.jpg
ADDED
Video-P2P/data/penguin_ice/8.jpg
ADDED
Video-P2P/data/rabbit/1.jpg
ADDED
Video-P2P/data/rabbit/2.jpg
ADDED
Video-P2P/data/rabbit/3.jpg
ADDED
Video-P2P/data/rabbit/4.jpg
ADDED
Video-P2P/data/rabbit/5.jpg
ADDED
Video-P2P/data/rabbit/6.jpg
ADDED