Leoxing commited on
Commit
0e6801b
1 Parent(s): 9bc8ca3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +85 -3
README.md CHANGED
@@ -1,3 +1,85 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ pipeline_tag: text-to-video
6
+ tags:
7
+ - diffusion
8
+ - video-to-video
9
+ - stable-diffusion
10
+ ---
11
+
12
+ # Live2Diff: **Live** Stream Translation via Uni-directional Attention in Video **Diffusion** Models
13
+
14
+ <p align="center">
15
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/62fb2a9dc95d426ff8f74c8d/XoBgMAR3O13n7ib3b0Qj2.png" width=100%>
16
+ </p>
17
+
18
+ **Authors:** [Zhening Xing](https://github.com/LeoXing1996), [Gereon Fox](https://people.mpi-inf.mpg.de/~gfox/), [Yanhong Zeng](https://zengyh1900.github.io/), [Xingang Pan](https://xingangpan.github.io/), [Mohamed Elgharib](https://people.mpi-inf.mpg.de/~elgharib/), [Christian Theobalt](https://people.mpi-inf.mpg.de/~theobalt/), [Kai Chen †](https://chenkai.site/) (†: corresponding author)
19
+
20
+
21
+ [![arXiv](https://img.shields.io/badge/arXiv-2407.08701-b31b1b.svg)](https://arxiv.org/abs/2407.08701)[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://live2diff.github.io/)[![Github Repo](https://img.shields.io/badge/Github-Repo-blue?logo=GitHub)](https://live2diff.github.io/)
22
+
23
+ ## Key Features
24
+
25
+ <p align="center">
26
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/62fb2a9dc95d426ff8f74c8d/qJ3-K3m_8LMGQWVko7p07.png" width=100%>
27
+ </p>
28
+
29
+ * **Uni-directional** Temporal Attention with **Warmup** Mechanism
30
+ * **Multitimestep KV-Cache** for Temporal Attention during Inference
31
+ * **Depth Prior** for Better Structure Consistency
32
+ * Compatible with **DreamBooth and LoRA** for Various Styles
33
+ * **TensorRT** Supported
34
+
35
+ The speed evaluation is conducted on **Ubuntu 20.04.6 LTS** and **Pytorch 2.2.2** with **RTX 4090 GPU** and **Intel(R) Xeon(R) Platinum 8352V CPU**. Denoising steps are set as 2.
36
+
37
+ | Resolution | TensorRT | FPS |
38
+ | :--------: | :------: | :-------: |
39
+ | 512 x 512 | **On** | **16.43** |
40
+ | 512 x 512 | Off | 6.91 |
41
+ | 768 x 512 | **On** | **12.15** |
42
+ | 768 x 512 | Off | 6.29 |
43
+
44
+ ## Real-Time Video2Video Demo
45
+
46
+ <div align="center">
47
+ <table align="center">
48
+ <tbody>
49
+ <tr align="center">
50
+ <td>
51
+ <p> Human Face (Web Camera Input) </p>
52
+ </td>
53
+ <td>
54
+ <p> Anime Character (Screen Video Input) </p>
55
+ </td>
56
+ </tr>
57
+ <tr align="center">
58
+ <td>
59
+ <video controls autoplay src="https://github.com/user-attachments/assets/c39e4b1f-e336-479a-af72-d07b1e3c6e30" width="100%">
60
+ </td>
61
+ <td>
62
+ <video controls autoplay src="https://github.com/user-attachments/assets/42727f46-b3cf-48ea-971c-9f653bf5a264" width="80%">
63
+ </td>
64
+ </tr>
65
+ </tbody>
66
+ </table>
67
+
68
+ </div>
69
+
70
+ ## Acknowledgements
71
+
72
+ The video and image demos in this GitHub repository were generated using [LCM-LoRA](https://huggingface.co/latent-consistency/lcm-lora-sdv1-5). Stream batch in [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion) is used for model acceleration. The design of Video Diffusion Model is adopted from [AnimateDiff](https://github.com/guoyww/AnimateDiff). We use a third-party implementation of [MiDaS](https://github.com/lewiji/MiDaS) implementation which support onnx export. Our online demo is modified from [Real-Time-Latent-Consistency-Model](https://github.com/radames/Real-Time-Latent-Consistency-Model/).
73
+
74
+ ## BibTex
75
+
76
+ If you find it helpful, please consider citing our work:
77
+
78
+ ```bibtex
79
+ @article{xing2024live2diff,
80
+ title={Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models},
81
+ author={Zhening Xing and Gereon Fox and Yanhong Zeng and Xingang Pan and Mohamed Elgharib and Christian Theobalt and Kai Chen},
82
+ booktitle={arXiv preprint arxiv:2407.08701},
83
+ year={2024}
84
+ }
85
+ ```