Spaces:

video-p2p-library
/

Video-P2P-Demo

Runtime error

App Files Files Community

ShaoTengLiu commited on Mar 19, 2023

Commit

69d3d9d

1 Parent(s): aa1f936

debug

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

Video-P2P-Beta +0 -1
Video-P2P/.DS_Store +0 -0
Video-P2P/.gitignore +3 -0
Video-P2P/README.md +99 -0
Video-P2P/configs/.DS_Store +0 -0
Video-P2P/configs/bird-forest-p2p.yaml +17 -0
Video-P2P/configs/bird-forest-tune.yaml +38 -0
Video-P2P/configs/car-drive-p2p.yaml +16 -0
Video-P2P/configs/car-drive-tune.yaml +38 -0
Video-P2P/configs/man-motor-p2p.yaml +16 -0
Video-P2P/configs/man-motor-tune.yaml +38 -0
Video-P2P/configs/man-surfing-tune.yaml +38 -0
Video-P2P/configs/penguin-run-p2p.yaml +16 -0
Video-P2P/configs/penguin-run-tune.yaml +38 -0
Video-P2P/configs/rabbit-jump-p2p.yaml +16 -0
Video-P2P/configs/rabbit-jump-tune.yaml +38 -0
Video-P2P/configs/tiger-forest-p2p.yaml +16 -0
Video-P2P/configs/tiger-forest-tune.yaml +38 -0
Video-P2P/data/.DS_Store +0 -0
Video-P2P/data/car/.DS_Store +0 -0
Video-P2P/data/car/1.jpg +0 -0
Video-P2P/data/car/2.jpg +0 -0
Video-P2P/data/car/3.jpg +0 -0
Video-P2P/data/car/4.jpg +0 -0
Video-P2P/data/car/5.jpg +0 -0
Video-P2P/data/car/6.jpg +0 -0
Video-P2P/data/car/7.jpg +0 -0
Video-P2P/data/car/8.jpg +0 -0
Video-P2P/data/motorbike/1.jpg +0 -0
Video-P2P/data/motorbike/2.jpg +0 -0
Video-P2P/data/motorbike/3.jpg +0 -0
Video-P2P/data/motorbike/4.jpg +0 -0
Video-P2P/data/motorbike/5.jpg +0 -0
Video-P2P/data/motorbike/6.jpg +0 -0
Video-P2P/data/motorbike/7.jpg +0 -0
Video-P2P/data/motorbike/8.jpg +0 -0
Video-P2P/data/penguin_ice/1.jpg +0 -0
Video-P2P/data/penguin_ice/2.jpg +0 -0
Video-P2P/data/penguin_ice/3.jpg +0 -0
Video-P2P/data/penguin_ice/4.jpg +0 -0
Video-P2P/data/penguin_ice/5.jpg +0 -0
Video-P2P/data/penguin_ice/6.jpg +0 -0
Video-P2P/data/penguin_ice/7.jpg +0 -0
Video-P2P/data/penguin_ice/8.jpg +0 -0
Video-P2P/data/rabbit/1.jpg +0 -0
Video-P2P/data/rabbit/2.jpg +0 -0
Video-P2P/data/rabbit/3.jpg +0 -0
Video-P2P/data/rabbit/4.jpg +0 -0
Video-P2P/data/rabbit/5.jpg +0 -0
Video-P2P/data/rabbit/6.jpg +0 -0

Video-P2P-Beta DELETED Viewed

	@@ -1 +0,0 @@
1	- Subproject commit 7a8fa7a8b8d81bbba367865f47b7894cdc4efafb

Video-P2P/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

Video-P2P/.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@

+*.pyc
+*.pt
+*.gif

Video-P2P/README.md ADDED Viewed

	@@ -0,0 +1,99 @@

+# Video-P2P: Video Editing with Cross-attention Control
+The official implementation of [Video-P2P](https://video-p2p.github.io/).
+[Shaoteng Liu](https://www.shaotengliu.com/), [Yuechen Zhang](https://julianjuaner.github.io/), [Wenbo Li](https://fenglinglwb.github.io/), [Zhe Lin](https://sites.google.com/site/zhelin625/), [Jiaya Jia](https://jiaya.me/)
+[![Project Website](https://img.shields.io/badge/Project-Website-orange)](https://video-p2p.github.io/)
+[![arXiv](https://img.shields.io/badge/arXiv-2303.04761-b31b1b.svg)](https://arxiv.org/abs/2303.04761)
+![Teaser](./docs/teaser.png)
+## Changelog
+- 2023.03.20 Release Gradio Demo.
+- 2023.03.19 Release Code.
+- 2023.03.09 Paper preprint on arxiv.
+## Todo
+- [x] Release the code with 6 examples.
+- [x] Update a faster version.
+- [x] Release all data.
+- [ ] Release the Gradio Demo.
+- [ ] Release more configs and new applications.
+## Setup
+``` bash
+pip install -r requirements.txt
+```
+The code was tested on both Tesla V100 32GB and RTX3090 24GB.
+The environment is similar to [Tune-A-video](https://github.com/showlab/Tune-A-Video) and [prompt-to-prompt](https://github.com/google/prompt-to-prompt/).
+[xformers](https://github.com/facebookresearch/xformers) on 3090 may meet this [issue](https://github.com/bryandlee/Tune-A-Video/issues/4).
+## Quickstart
+Please replace ``pretrained_model_path'' with the path to your stable-diffusion.
+``` bash
+# You can minimize the tuning epochs to speed up.
+python run_tuning.py  --config="configs/rabbit-jump-tune.yaml" # Tuning to do model initialization.
+# We develop a faster mode (1 min on V100):
+python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml" --fast
+# The official mode (10 mins on V100, more stable):
+python run_videop2p.py --config="configs/rabbit-jump-p2p.yaml"
+```
+## Dataset
+We release our dataset [here]().
+Download them under ./data and explore your creativity!
+## Results
+<table class="center">
+<tr>
+  <td width=50% style="text-align:center;">configs/rabbit-jump-p2p.yaml</td>
+  <td width=50% style="text-align:center;">configs/penguin-run-p2p.yaml</td>
+</tr>
+<tr>
+  <td><img src="https://video-p2p.github.io/assets/rabbit.gif"></td>
+  <td><img src="https://video-p2p.github.io/assets/penguin-crochet.gif"></td>
+</tr>
+<tr>
+  <td width=50% style="text-align:center;">configs/man-motor-p2p.yaml</td>
+  <td width=50% style="text-align:center;">configs/car-drive-p2p.yaml</td>
+</tr>
+<tr>
+  <td><img src="https://video-p2p.github.io/assets/motor.gif"></td>
+  <td><img src="https://video-p2p.github.io/assets/car.gif"></td>
+</tr>
+<tr>
+  <td width=50% style="text-align:center;">configs/tiger-forest-p2p.yaml</td>
+  <td width=50% style="text-align:center;">configs/bird-forest-p2p.yaml</td>
+</tr>
+<tr>
+  <td><img src="https://video-p2p.github.io/assets/tiger.gif"></td>
+  <td><img src="https://video-p2p.github.io/assets/bird-child.gif"></td>
+</tr>
+</table>
+## Citation
+```
+@misc{liu2023videop2p,
+      author={Liu, Shaoteng and Zhang, Yuechen and Li, Wenbo and Lin, Zhe and Jia, Jiaya},
+      title={Video-P2P: Video Editing with Cross-attention Control},
+      journal={arXiv:2303.04761},
+      year={2023},
+}
+```
+## References
+* prompt-to-prompt: https://github.com/google/prompt-to-prompt
+* Tune-A-Video: https://github.com/showlab/Tune-A-Video
+* diffusers: https://github.com/huggingface/diffusers

Video-P2P/configs/.DS_Store ADDED Viewed

Binary file (6.15 kB). View file

Video-P2P/configs/bird-forest-p2p.yaml ADDED Viewed

	@@ -0,0 +1,17 @@

+pretrained_model_path: "./outputs/bird-forest"
+image_path: "./data/bird_forest"
+prompt: "a bird flying in the forest"
+prompts:
+  - "a bird flying in the forest"
+  - "children drawing of a bird flying in the forest"
+eq_params:
+  words:
+    - "children"
+    - "drawing"
+  values:
+    - 5
+    - 2
+save_name: "children"
+is_word_swap: False
+cross_replace_steps: 0.8
+self_replace_steps: 0.7

Video-P2P/configs/bird-forest-tune.yaml ADDED Viewed

	@@ -0,0 +1,38 @@

+pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
+output_dir: "./outputs/bird-forest"
+train_data:
+  video_path: "./data/bird_forest"
+  prompt: "a bird flying in the forest"
+  n_sample_frames: 8
+  width: 512
+  height: 512
+  sample_start_idx: 0
+  sample_frame_rate: 1
+validation_data:
+  prompts:
+    - "a bird flying in the forest"
+  video_length: 8
+  width: 512
+  height: 512
+  num_inference_steps: 50
+  guidance_scale: 12.5
+  use_inv_latent: True
+  num_inv_steps: 50
+learning_rate: 3e-5
+train_batch_size: 1
+max_train_steps: 500
+checkpointing_steps: 1000
+validation_steps: 600
+trainable_modules:
+  - "attn1.to_q"
+  - "attn2.to_q"
+  - "attn_temp"
+seed: 33
+mixed_precision: fp16
+use_8bit_adam: False
+gradient_checkpointing: True
+enable_xformers_memory_efficient_attention: True

Video-P2P/configs/car-drive-p2p.yaml ADDED Viewed

	@@ -0,0 +1,16 @@

+pretrained_model_path: "./outputs/car-drive"
+image_path: "./data/car"
+prompt: "a car is driving on the road"
+prompts:
+  - "a car is driving on the road"
+  - "a car is driving on the railway"
+blend_word:
+  - 'road'
+  - 'railway'
+eq_params:
+  words:
+    - "railway"
+  values:
+    - 2
+save_name: "railway"
+is_word_swap: True

Video-P2P/configs/car-drive-tune.yaml ADDED Viewed

	@@ -0,0 +1,38 @@

+pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
+output_dir: "./outputs/car-drive"
+train_data:
+  video_path: "./data/car"
+  prompt: "a car is driving on the road"
+  n_sample_frames: 8
+  width: 512
+  height: 512
+  sample_start_idx: 0
+  sample_frame_rate: 1
+validation_data:
+  prompts:
+    - "a car is driving on the railway"
+  video_length: 8
+  width: 512
+  height: 512
+  num_inference_steps: 50
+  guidance_scale: 12.5
+  use_inv_latent: True
+  num_inv_steps: 50
+learning_rate: 3e-5
+train_batch_size: 1
+max_train_steps: 300
+checkpointing_steps: 1000
+validation_steps: 300
+trainable_modules:
+  - "attn1.to_q"
+  - "attn2.to_q"
+  - "attn_temp"
+seed: 33
+mixed_precision: fp16
+use_8bit_adam: False
+gradient_checkpointing: True
+enable_xformers_memory_efficient_attention: True

Video-P2P/configs/man-motor-p2p.yaml ADDED Viewed

	@@ -0,0 +1,16 @@

+pretrained_model_path: "./outputs/man-motor"
+image_path: "./data/motorbike"
+prompt: "a man is driving a motorbike in the forest"
+prompts:
+  - "a man is driving a motorbike in the forest"
+  - "a Spider-Man is driving a motorbike in the forest"
+blend_word:
+  - 'man'
+  - 'Spider-Man'
+eq_params:
+  words:
+    - "Spider-Man"
+  values:
+    - 4
+save_name: "spider"
+is_word_swap: True

Video-P2P/configs/man-motor-tune.yaml ADDED Viewed

	@@ -0,0 +1,38 @@

+pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
+output_dir: "./outputs/man-motor"
+train_data:
+  video_path: "./data/motorbike"
+  prompt: "a man is driving a motorbike in the forest"
+  n_sample_frames: 8
+  width: 512
+  height: 512
+  sample_start_idx: 0
+  sample_frame_rate: 1
+validation_data:
+  prompts:
+    - "a Spider-Man is driving a motorbike in the forest"
+  video_length: 8
+  width: 512
+  height: 512
+  num_inference_steps: 50
+  guidance_scale: 12.5
+  use_inv_latent: True
+  num_inv_steps: 50
+learning_rate: 3e-5
+train_batch_size: 1
+max_train_steps: 500
+checkpointing_steps: 1000
+validation_steps: 500
+trainable_modules:
+  - "attn1.to_q"
+  - "attn2.to_q"
+  - "attn_temp"
+seed: 33
+mixed_precision: fp16
+use_8bit_adam: False
+gradient_checkpointing: True
+enable_xformers_memory_efficient_attention: True

Video-P2P/configs/man-surfing-tune.yaml ADDED Viewed

	@@ -0,0 +1,38 @@

+pretrained_model_path: "./checkpoints/stable-diffusion-v1-4"
+output_dir: "./outputs/man-surfing"
+train_data:
+  video_path: "data/man-surfing.mp4"
+  prompt: "a man is surfing"
+  n_sample_frames: 8
+  width: 512
+  height: 512
+  sample_start_idx: 0
+  sample_frame_rate: 1
+validation_data:
+  prompts:
+    - "a panda is surfing"
+  video_length: 8
+  width: 512
+  height: 512
+  num_inference_steps: 50
+  guidance_scale: 12.5
+  use_inv_latent: True
+  num_inv_steps: 50
+learning_rate: 3e-5
+train_batch_size: 1
+max_train_steps: 500
+checkpointing_steps: 1000
+validation_steps: 500
+trainable_modules:
+  - "attn1.to_q"
+  - "attn2.to_q"
+  - "attn_temp"
+seed: 33
+mixed_precision: fp16
+use_8bit_adam: False
+gradient_checkpointing: True
+enable_xformers_memory_efficient_attention: True

Video-P2P/configs/penguin-run-p2p.yaml ADDED Viewed

	@@ -0,0 +1,16 @@

+pretrained_model_path: "./outputs/penguin-run"
+image_path: "./data/penguin_ice"
+prompt: "a penguin is running on the ice"
+prompts:
+  - "a penguin is running on the ice"
+  - "a crochet penguin is running on the ice"
+blend_word:
+  - 'penguin'
+  - 'penguin'
+eq_params:
+  words:
+    - "crochet"
+  values:
+    - 4
+save_name: "crochet"
+is_word_swap: False

Video-P2P/configs/penguin-run-tune.yaml ADDED Viewed

	@@ -0,0 +1,38 @@

+pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
+output_dir: "./outputs/penguin-run"
+train_data:
+  video_path: "./data/penguin_ice"
+  prompt: "a penguin is running on the ice"
+  n_sample_frames: 8
+  width: 512
+  height: 512
+  sample_start_idx: 0
+  sample_frame_rate: 1
+validation_data:
+  prompts:
+    - "a crochet penguin is running on the ice"
+  video_length: 8
+  width: 512
+  height: 512
+  num_inference_steps: 50
+  guidance_scale: 12.5
+  use_inv_latent: True
+  num_inv_steps: 50
+learning_rate: 3e-5
+train_batch_size: 1
+max_train_steps: 300
+checkpointing_steps: 1000
+validation_steps: 300
+trainable_modules:
+  - "attn1.to_q"
+  - "attn2.to_q"
+  - "attn_temp"
+seed: 33
+mixed_precision: fp16
+use_8bit_adam: False
+gradient_checkpointing: True
+enable_xformers_memory_efficient_attention: True

Video-P2P/configs/rabbit-jump-p2p.yaml ADDED Viewed

	@@ -0,0 +1,16 @@

+pretrained_model_path: "./outputs/rabbit-jump"
+image_path: "./data/rabbit"
+prompt: "a rabbit is jumping on the grass"
+prompts:
+  - "a rabbit is jumping on the grass"
+  - "a origami rabbit is jumping on the grass"
+blend_word:
+  - 'rabbit'
+  - 'rabbit'
+eq_params:
+  words:
+    - "origami"
+  values:
+    - 2
+save_name: "origami"
+is_word_swap: False

Video-P2P/configs/rabbit-jump-tune.yaml ADDED Viewed

	@@ -0,0 +1,38 @@

+pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
+output_dir: "./outputs/rabbit-jump"
+train_data:
+  video_path: "./data/rabbit"
+  prompt: "a rabbit is jumping on the grass"
+  n_sample_frames: 8
+  width: 512
+  height: 512
+  sample_start_idx: 0
+  sample_frame_rate: 1
+validation_data:
+  prompts:
+    - "a origami rabbit is jumping on the grass"
+  video_length: 8
+  width: 512
+  height: 512
+  num_inference_steps: 50
+  guidance_scale: 12.5
+  use_inv_latent: True
+  num_inv_steps: 50
+learning_rate: 3e-5
+train_batch_size: 1
+max_train_steps: 500
+checkpointing_steps: 1000
+validation_steps: 500
+trainable_modules:
+  - "attn1.to_q"
+  - "attn2.to_q"
+  - "attn_temp"
+seed: 33
+mixed_precision: fp16
+use_8bit_adam: False
+gradient_checkpointing: True
+enable_xformers_memory_efficient_attention: True

Video-P2P/configs/tiger-forest-p2p.yaml ADDED Viewed

	@@ -0,0 +1,16 @@

+pretrained_model_path: "./outputs/tiger-forest"
+image_path: "./data/tiger"
+prompt: "a tiger is walking in the forest"
+prompts:
+  - "a tiger is walking in the forest"
+  - "a Lego tiger is walking in the forest"
+blend_word:
+  - 'tiger'
+  - 'tiger'
+eq_params:
+  words:
+    - "Lego"
+  values:
+    - 2
+save_name: "lego"
+is_word_swap: False

Video-P2P/configs/tiger-forest-tune.yaml ADDED Viewed

	@@ -0,0 +1,38 @@

+pretrained_model_path: "/data/stable-diffusion/stable-diffusion-v1-5"
+output_dir: "./outputs/tiger-forest"
+train_data:
+  video_path: "./data/tiger"
+  prompt: "a tiger is walking in the forest"
+  n_sample_frames: 8
+  width: 512
+  height: 512
+  sample_start_idx: 0
+  sample_frame_rate: 1
+validation_data:
+  prompts:
+    - "a Lego tiger is walking in the forest"
+  video_length: 8
+  width: 512
+  height: 512
+  num_inference_steps: 50
+  guidance_scale: 12.5
+  use_inv_latent: True
+  num_inv_steps: 50
+learning_rate: 3e-5
+train_batch_size: 1
+max_train_steps: 500
+checkpointing_steps: 1000
+validation_steps: 500
+trainable_modules:
+  - "attn1.to_q"
+  - "attn2.to_q"
+  - "attn_temp"
+seed: 33
+mixed_precision: fp16
+use_8bit_adam: False
+gradient_checkpointing: True
+enable_xformers_memory_efficient_attention: True