Spaces:

hikerxu
/

fresco

Paused

App Files Files Community

hikerxu commited on Mar 27, 2024

Commit

7f1f1cb

verified ·

1 Parent(s): fca5a53

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.gitattributes +2 -0
LICENSE.md +14 -0
README.md +203 -8
config/config_boxer.yaml +27 -0
config/config_carturn.yaml +30 -0
config/config_dog.yaml +27 -0
config/config_music.yaml +27 -0
data/boxer-punching-towards-camera.mp4 +3 -0
data/car-turn.mp4 +0 -0
data/dog.mp4 +0 -0
data/music.mp4 +0 -0
install.py +95 -0
model/README.md +0 -0
model/epoch_resnet.pth +3 -0
model/gmflow_sintel-0c07dcb3.pth +3 -0
output/1/video/0000.png +0 -0
output/1/video/0001.png +0 -0
output/1/video/0002.png +0 -0
output/1/video/0003.png +0 -0
output/1/video/0004.png +0 -0
output/1/video/0005.png +0 -0
output/1/video/0006.png +0 -0
output/1/video/0007.png +0 -0
output/1/video/0008.png +0 -0
output/1/video/0009.png +0 -0
output/1/video/0010.png +0 -0
output/1/video/0011.png +0 -0
output/1/video/0012.png +0 -0
output/1/video/0013.png +0 -0
output/1/video/0014.png +0 -0
output/1/video/0015.png +0 -0
output/1/video/0016.png +0 -0
output/1/video/0017.png +0 -0
output/1/video/0018.png +0 -0
output/1/video/0019.png +0 -0
output/1/video/0020.png +0 -0
output/1/video/0021.png +0 -0
output/1/video/0022.png +0 -0
output/1/video/0023.png +0 -0
output/1/video/0024.png +0 -0
output/1/video/0025.png +0 -0
output/1/video/0026.png +0 -0
output/1/video/0027.png +0 -0
output/1/video/0028.png +0 -0
output/1/video/0029.png +0 -0
output/1/video/0030.png +0 -0
output/1/video/0031.png +0 -0
output/1/video/0032.png +0 -0
output/1/video/0033.png +0 -0
output/1/video/0034.png +0 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,5 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+data/boxer-punching-towards-camera.mp4 filter=lfs diff=lfs merge=lfs -text
+src/ebsynth/deps/ebsynth/bin/ebsynth filter=lfs diff=lfs merge=lfs -text

LICENSE.md ADDED Viewed

	@@ -0,0 +1,14 @@

+# S-Lab License 1.0
+Copyright 2024 S-Lab
+Redistribution and use for non-commercial purpose in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
+1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
+2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
+3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.\
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+4. In the event that redistribution and/or use for commercial purpose in source or binary forms, with or without modification is required, please contact the contributor(s) of the work.
+---
+For the commercial use of the code, please consult Prof. Chen Change Loy (ccloy@ntu.edu.sg)

README.md CHANGED Viewed

@@ -1,12 +1,207 @@
 ---
-title: Fresco
-emoji: 💻
-colorFrom: gray
-colorTo: purple
 sdk: gradio
-sdk_version: 4.23.0
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: fresco
+app_file: webUI.py
 sdk: gradio
+sdk_version: 3.50.2
 ---
+# FRESCO - Official PyTorch Implementation
+**FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation**<br>
+[Shuai Yang](https://williamyang1991.github.io/), [Yifan Zhou](https://zhouyifan.net/), [Ziwei Liu](https://liuziwei7.github.io/) and [Chen Change Loy](https://www.mmlab-ntu.com/person/ccloy/)<br>
+in CVPR 2024 <br>
+[**Project Page**](https://www.mmlab-ntu.com/project/fresco/) | [**Paper**](https://arxiv.org/abs/2403.12962) | [**Supplementary Video**](https://youtu.be/jLnGx5H-wLw) | [**Input Data and Video Results**](https://drive.google.com/file/d/12BFx3hp8_jp9m0EmKpw-cus2SABPQx2Q/view?usp=sharing) <br>
+**Abstract:** *The remarkable efficacy of text-to-image diffusion models has motivated extensive exploration of their potential application in video domains.
+Zero-shot methods seek to extend image diffusion models to videos without necessitating model training.
+Recent methods mainly focus on incorporating inter-frame correspondence into attention mechanisms. However, the soft constraint imposed on determining where to attend to valid features can sometimes be insufficient, resulting in temporal inconsistency.
+In this paper, we introduce FRESCO, intra-frame correspondence alongside inter-frame correspondence to establish a more robust spatial-temporal constraint. This enhancement ensures a more consistent transformation of semantically similar content across frames. Beyond mere attention guidance, our approach involves an explicit update of features to achieve high spatial-temporal consistency with the input video, significantly improving the visual coherence of the resulting translated videos.
+Extensive experiments demonstrate the effectiveness of our proposed framework in producing high-quality, coherent videos, marking a notable improvement over existing zero-shot methods.*
+**Features**:<br>
+- **Temporal consistency**: use intra-and inter-frame constraint with better consistency and coverage than optical flow alone.
+    - Compared with our previous work [Rerender-A-Video](https://github.com/williamyang1991/Rerender_A_Video), FRESCO is more robust to large and quick motion.
+- **Zero-shot**: no training or fine-tuning required.
+- **Flexibility**: compatible with off-the-shelf models (e.g., [ControlNet](https://github.com/lllyasviel/ControlNet), [LoRA](https://civitai.com/)) for customized translation.
+https://github.com/williamyang1991/FRESCO/assets/18130694/aad358af-4d27-4f18-b069-89a1abd94d38
+## Updates
+- [03/2023] Paper is released.
+- [03/2023] Code is released.
+- [03/2024] This website is created.
+### TODO
+- [x] Integrate into Diffusers
+- [x] Add Huggingface web demo
+- [x] ~~Add webUI.~~
+- [x] ~~Update readme~~
+- [x] ~~Upload paper to arXiv, release related material~~
+## Installation
+1. Clone the repository.
+```shell
+git clone https://github.com/williamyang1991/FRESCO.git
+cd FRESCO
+```
+2. You can simply set up the environment with pip based on [requirements.txt](https://github.com/williamyang1991/FRESCO/blob/main/requirements.txt)
+    - We have tested on torch 2.0.0/2.1.0 and diffusers 0.19.3
+    - If you use new versions of diffusers, you need to modify [my_forward()](https://github.com/williamyang1991/FRESCO/blob/fb991262615665de88f7a8f2cc903d9539e1b234/src/diffusion_hacked.py#L496)
+3. Run the installation script. The required models will be downloaded in `./model`, `./src/ControlNet/annotator` and `./src/ebsynth/deps/ebsynth/bin`.
+    - Requires access to huggingface.co
+```shell
+python install.py
+```
+4. You can run the demo with `run_fresco.py`
+```shell
+python run_fresco.py ./config/config_music.yaml
+```
+5. For issues with Ebsynth, please refer to [issues](https://github.com/williamyang1991/Rerender_A_Video#issues)
+## (1) Inference
+### WebUI (recommended)
+```
+python webUI.py
+```
+The Gradio app also allows you to flexibly change the inference options. Just try it for more details.
+Upload your video, input the prompt, select the model and seed, and hit:
+- **Run Key Frames**: detect keyframes, translate all keyframes.
+- **Run Propagation**: propagate the keyframes to other frames for full video translation
+- **Run All**: **Run Key Frames** and **Run Propagation**
+Select the model:
+- **Base model**: base Stable Diffusion model (SD 1.5)
+    - Stable Diffusion 1.5: official model
+    - [rev-Animated](https://huggingface.co/stablediffusionapi/rev-animated): a semi-realistic (2.5D) model
+    - [realistic-Vision](https://huggingface.co/SG161222/Realistic_Vision_V2.0): a photo-realistic model
+    - [flat2d-animerge](https://huggingface.co/stablediffusionapi/flat-2d-animerge): a cartoon model
+    - You can add other models on huggingface.co by modifying this [line](https://github.com/williamyang1991/FRESCO/blob/1afcca9c7b1bc1ac68254f900be9bd768fbb6988/webUI.py#L362)
+![overview](https://github.com/williamyang1991/FRESCO/assets/18130694/6ce5d54e-b020-4e43-95e7-72ab1783f482)
+We provide abundant advanced options to play with
+</details>
+<details id="option1">
+<summary> <b>Advanced options for single frame processing</b></summary>
+1. **Frame resolution**: resize the short side of the video to 512.
+2. ControlNet related:
+   - **ControlNet strength**: how well the output matches the input control edges
+   - **Control type**: HED edge, Canny edge, Depth map
+   - **Canny low/high threshold**: low values for more edge details
+3. SDEdit related:
+   - **Denoising strength**: repaint degree (low value to make the output look more like the original video)
+   - **Preserve color**: preserve the color of the original video
+4. SD related:
+   - **Steps**: denoising step
+   - **CFG scale**: how well the output matches the prompt
+   - **Added prompt/Negative prompt**: supplementary prompts
+5. FreeU related:
+   - **FreeU first/second-stage backbone factor**: =1 do nothing; >1 enhance output color and details
+   - **FreeU first/second-stage skip factor**: =1 do nothing; <1 enhance output color and details
+</details>
+<details id="option2">
+<summary> <b>Advanced options for FRESCO constraints</b></summary>
+1. Keyframe related
+   - **Number of frames**: Total frames to be translated
+   - **Number of frames in a batch**: To avoid out-of-memory, use small batch size
+   - **Min keyframe interval (s_min)**: The keyframes will be detected at least every s_min frames
+   - **Max keyframe interval (s_max)**: The keyframes will be detected at most every s_max frames
+2. FRESCO constraints
+   - FRESCO-guided Attention:
+     - **spatial-guided attention**: Check to enable spatial-guided attention
+     - **cross-frame attention**: Check to enable efficient cross-frame attention
+     - **temporal-guided attention**: Check to enable temporal-guided attention
+   - FRESCO-guided optimization:
+     - **spatial-guided optimization**: Check to enable spatial-guided optimization
+     - **temporal-guided optimization**: Check to enable temporal-guided optimization
+3. **Background smoothing**: Check to enable background smoothing (best for static background)
+</details>
+<details id="option3">
+<summary> <b>Advanced options for the full video translation</b></summary>
+1. **Gradient blending**: apply Poisson Blending to reduce ghosting artifacts. May slow the process and increase flickers.
+2. **Number of parallel processes**: multiprocessing to speed up the process. Large value (4) is recommended.
+</details>
+![option](https://github.com/williamyang1991/FRESCO/assets/18130694/72600758-1dff-4b7c-8f3f-65ee3909f8f6)
+### Command Line
+We provide a flexible script `run_fresco.py` to run our method.
+Set the options via a config file. For example,
+```shell
+python run_fresco.py ./config/config_music.yaml
+```
+We provide some examples of the config in `config` directory.
+Most options in the config is the same as those in WebUI.
+Please check the explanations in the WebUI section.
+We provide a separate Ebsynth python script `video_blend.py` with the temporal blending algorithm introduced in
+[Stylizing Video by Example](https://dcgi.fel.cvut.cz/home/sykorad/ebsynth.html) for interpolating style between key frames.
+It can work on your own stylized key frames independently of our FRESCO algorithm.
+For the details, please refer to our previous work [Rerender-A-Video](https://github.com/williamyang1991/Rerender_A_Video/tree/main?tab=readme-ov-file#our-ebsynth-implementation)
+## (2) Results
+### Key frame translation
+<table class="center">
+<tr>
+  <td><img src="https://github.com/williamyang1991/FRESCO/assets/18130694/e8d5776a-37c5-49ae-8ab4-15669df6f572" raw=true></td>
+  <td><img src="https://github.com/williamyang1991/FRESCO/assets/18130694/8a792af6-555c-4e82-ac1e-5c2e1ee35fdb" raw=true></td>
+  <td><img src="https://github.com/williamyang1991/FRESCO/assets/18130694/10f9a964-85ac-4433-84c5-1611a6c2c434" raw=true></td>
+  <td><img src="https://github.com/williamyang1991/FRESCO/assets/18130694/0ec0fbf9-90dd-4d8b-964d-945b5f6687c2" raw=true></td>
+</tr>
+<tr>
+  <td width=26.5% align="center">a red car turns in the winter</td>
+  <td width=26.5% align="center">an African American boxer wearing black boxing gloves punches towards the camera, cartoon style</td>
+  <td width=26.5% align="center">a cartoon spiderman in black suit, black shoes and white gloves is dancing</td>
+  <td width=20.5% align="center">a beautiful woman holding her glasses in CG style</td>
+</tr>
+</table>
+### Full video translation
+https://github.com/williamyang1991/FRESCO/assets/18130694/bf8bfb82-5cb7-4b2f-8169-cf8dbf408b54
+## Citation
+If you find this work useful for your research, please consider citing our paper:
+```bibtex
+@inproceedings{yang2024fresco,
+ title = {FRESCO: Spatial-Temporal Correspondence for Zero-Shot Video Translation},
+ author = {Yang, Shuai and Zhou, Yifan and Liu, Ziwei and and Loy, Chen Change},
+ booktitle = {CVPR},
+ year = {2024},
+}
+```
+## Acknowledgments
+The code is mainly developed based on [Rerender-A-Video](https://github.com/williamyang1991/Rerender_A_Video), [ControlNet](https://github.com/lllyasviel/ControlNet), [Stable Diffusion](https://github.com/Stability-AI/stablediffusion), [GMFlow](https://github.com/haofeixu/gmflow) and [Ebsynth](https://github.com/jamriska/ebsynth).

config/config_boxer.yaml ADDED Viewed

	@@ -0,0 +1,27 @@

+# data
+file_path: './data/boxer-punching-towards-camera.mp4'
+save_path: './output/boxer-punching-towards-camera/'
+mininterv: 2 # for keyframe selection
+maxinterv: 2 # for keyframe selection
+# diffusion
+seed: 0
+prompt: 'An African American boxer wearing black boxing gloves punches towards the camera, cartoon style'
+sd_path: 'stablediffusionapi/flat-2d-animerge'
+use_controlnet: True
+controlnet_type: 'depth'  # 'hed', 'canny'
+cond_scale: 0.7
+use_freeu: False
+# video-to-video translation
+batch_size: 8
+num_inference_steps: 20
+num_warmup_steps: 5
+end_opt_step: 15
+run_ebsynth: False
+max_process: 4
+# supporting model
+gmflow_path: './model/gmflow_sintel-0c07dcb3.pth'
+sod_path: './model/epoch_resnet.pth'
+use_salinecy: True

config/config_carturn.yaml ADDED Viewed

	@@ -0,0 +1,30 @@

+# data
+file_path: './data/car-turn.mp4'
+save_path: './output/car-turn/'
+mininterv: 5 # for keyframe selection
+maxinterv: 5 # for keyframe selection
+# diffusion
+seed: 0
+prompt: 'a red car turns in the winter'
+# sd_path: 'runwayml/stable-diffusion-v1-5'
+# sd_path: 'stablediffusionapi/rev-animated'
+# sd_path: 'stablediffusionapi/flat-2d-animerge'
+sd_path: 'SG161222/Realistic_Vision_V2.0'
+use_controlnet: True
+controlnet_type: 'hed'  # 'depth', 'canny'
+cond_scale: 0.7
+use_freeu: False
+# video-to-video translation
+batch_size: 8
+num_inference_steps: 20
+num_warmup_steps: 5
+end_opt_step: 15
+run_ebsynth: False
+max_process: 4
+# supporting model
+gmflow_path: './model/gmflow_sintel-0c07dcb3.pth'
+sod_path: './model/epoch_resnet.pth'
+use_salinecy: True

config/config_dog.yaml ADDED Viewed

	@@ -0,0 +1,27 @@

+# data
+file_path: './data/dog.mp4'
+save_path: './output/dog/'
+mininterv: 10 # for keyframe selection
+maxinterv: 30 # for keyframe selection
+# diffusion
+seed: 0
+prompt: 'greetings from a fox by shaking front paws'
+sd_path: 'SG161222/Realistic_Vision_V2.0'
+use_controlnet: True
+controlnet_type: 'hed'  # 'depth', 'canny'
+cond_scale: 1.0
+use_freeu: False
+# video-to-video translation
+batch_size: 8
+num_inference_steps: 20
+num_warmup_steps: 8
+end_opt_step: 15
+run_ebsynth: False
+max_process: 4
+# supporting model
+gmflow_path: './model/gmflow_sintel-0c07dcb3.pth'
+sod_path: './model/epoch_resnet.pth'
+use_salinecy: True

config/config_music.yaml ADDED Viewed

	@@ -0,0 +1,27 @@

+# data
+file_path: './data/music.mp4'
+save_path: './output/music/'
+mininterv: 10 # for keyframe selection
+maxinterv: 30 # for keyframe selection
+# diffusion
+seed: 0
+prompt: 'A beautiful woman with headphones listening to music in CG cyberpunk style, neon, closed eyes, colorful'
+sd_path: 'stablediffusionapi/rev-animated'
+use_controlnet: True
+controlnet_type: 'hed'  # 'depth', 'canny'
+cond_scale: 1.0
+use_freeu: False
+# video-to-video translation
+batch_size: 8
+num_inference_steps: 20
+num_warmup_steps: 3
+end_opt_step: 15
+run_ebsynth: False
+max_process: 4
+# supporting model
+gmflow_path: './model/gmflow_sintel-0c07dcb3.pth'
+sod_path: './model/epoch_resnet.pth'
+use_salinecy: True

data/boxer-punching-towards-camera.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:265fc4d5e53bfdc1b8fb8b7792815bd86d8d5bd14b1463f41e5df7d9fc500525
+size 1467723

data/car-turn.mp4 ADDED Viewed

Binary file (942 kB). View file

data/dog.mp4 ADDED Viewed

Binary file (759 kB). View file

data/music.mp4 ADDED Viewed

Binary file (830 kB). View file

install.py ADDED Viewed

	@@ -0,0 +1,95 @@

+import os
+import platform
+import requests
+def build_ebsynth():
+    if os.path.exists('src/ebsynth/deps/ebsynth/bin/ebsynth'):
+        print('Ebsynth has been built.')
+        return
+    os_str = platform.system()
+    if os_str == 'Windows':
+        print('Build Ebsynth Windows 64 bit.',
+              'If you want to build for 32 bit, please modify install.py.')
+        cmd = '.\\build-win64-cpu+cuda.bat'
+        exe_file = 'src/ebsynth/deps/ebsynth/bin/ebsynth.exe'
+    elif os_str == 'Linux':
+        cmd = 'bash build-linux-cpu+cuda.sh'
+        exe_file = 'src/ebsynth/deps/ebsynth/bin/ebsynth'
+    elif os_str == 'Darwin':
+        cmd = 'sh build-macos-cpu_only.sh'
+        exe_file = 'src/ebsynth/deps/ebsynth/bin/ebsynth.app'
+    else:
+        print('Cannot recognize OS. Ebsynth installation stopped.')
+        return
+    os.chdir('src/ebsynth/deps/ebsynth')
+    print(cmd)
+    os.system(cmd)
+    os.chdir('../../../..')
+    if os.path.exists(exe_file):
+        print('Ebsynth installed successfully.')
+    else:
+        print('Failed to install Ebsynth.')
+def download(url, dir, name=None):
+    os.makedirs(dir, exist_ok=True)
+    if name is None:
+        name = url.split('/')[-1]
+    path = os.path.join(dir, name)
+    if not os.path.exists(path):
+        print(f'Install {name} ...')
+        open(path, 'wb').write(requests.get(url).content)
+        print('Install successfully.')
+def download_gmflow_ckpt():
+    url = ('https://huggingface.co/PKUWilliamYang/Rerender/'
+           'resolve/main/models/gmflow_sintel-0c07dcb3.pth')
+    download(url, 'model')
+def download_egnet_ckpt():
+    url = ('https://huggingface.co/PKUWilliamYang/Rerender/'
+           'resolve/main/models/epoch_resnet.pth')
+    download(url, 'model')
+def download_hed_ckpt():
+    url = ('https://huggingface.co/lllyasviel/Annotators/'
+           'resolve/main/ControlNetHED.pth')
+    download(url, 'src/ControlNet/annotator/ckpts')
+def download_depth_ckpt():
+    url = ('https://huggingface.co/lllyasviel/ControlNet/'
+           'resolve/main/annotator/ckpts/dpt_hybrid-midas-501f0c75.pt')
+    download(url, 'src/ControlNet/annotator/ckpts')
+def download_ebsynth_ckpt():
+    os_str = platform.system()
+    if os_str == 'Linux':
+        url = ('https://huggingface.co/PKUWilliamYang/Rerender/'
+               'resolve/main/models/ebsynth')
+        download(url, 'src/ebsynth/deps/ebsynth/bin')
+    elif os_str == 'Windows':
+        url = ('https://huggingface.co/PKUWilliamYang/Rerender/'
+               'resolve/main/models/ebsynth.exe')
+        download(url, 'src/ebsynth/deps/ebsynth/bin')
+        url = ('https://huggingface.co/PKUWilliamYang/Rerender/'
+               'resolve/main/models/ebsynth_cpu.dll')
+        download(url, 'src/ebsynth/deps/ebsynth/bin')
+        url = ('https://huggingface.co/PKUWilliamYang/Rerender/'
+               'resolve/main/models/ebsynth_cpu.exe')
+        download(url, 'src/ebsynth/deps/ebsynth/bin')
+    else:
+        print('No available compiled Ebsynth.')
+#build_ebsynth()
+download_ebsynth_ckpt()
+download_gmflow_ckpt()
+download_egnet_ckpt()
+download_hed_ckpt()
+download_depth_ckpt()