Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,85 @@
|
|
1 |
-
---
|
2 |
-
license: apache-2.0
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
pipeline_tag: text-to-video
|
6 |
+
tags:
|
7 |
+
- diffusion
|
8 |
+
- video-to-video
|
9 |
+
- stable-diffusion
|
10 |
+
---
|
11 |
+
|
12 |
+
# Live2Diff: **Live** Stream Translation via Uni-directional Attention in Video **Diffusion** Models
|
13 |
+
|
14 |
+
<p align="center">
|
15 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/62fb2a9dc95d426ff8f74c8d/XoBgMAR3O13n7ib3b0Qj2.png" width=100%>
|
16 |
+
</p>
|
17 |
+
|
18 |
+
**Authors:** [Zhening Xing](https://github.com/LeoXing1996), [Gereon Fox](https://people.mpi-inf.mpg.de/~gfox/), [Yanhong Zeng](https://zengyh1900.github.io/), [Xingang Pan](https://xingangpan.github.io/), [Mohamed Elgharib](https://people.mpi-inf.mpg.de/~elgharib/), [Christian Theobalt](https://people.mpi-inf.mpg.de/~theobalt/), [Kai Chen †](https://chenkai.site/) (†: corresponding author)
|
19 |
+
|
20 |
+
|
21 |
+
[![arXiv](https://img.shields.io/badge/arXiv-2407.08701-b31b1b.svg)](https://arxiv.org/abs/2407.08701)[![Project Page](https://img.shields.io/badge/Project-Page-blue)](https://live2diff.github.io/)[![Github Repo](https://img.shields.io/badge/Github-Repo-blue?logo=GitHub)](https://live2diff.github.io/)
|
22 |
+
|
23 |
+
## Key Features
|
24 |
+
|
25 |
+
<p align="center">
|
26 |
+
<img src="https://cdn-uploads.huggingface.co/production/uploads/62fb2a9dc95d426ff8f74c8d/qJ3-K3m_8LMGQWVko7p07.png" width=100%>
|
27 |
+
</p>
|
28 |
+
|
29 |
+
* **Uni-directional** Temporal Attention with **Warmup** Mechanism
|
30 |
+
* **Multitimestep KV-Cache** for Temporal Attention during Inference
|
31 |
+
* **Depth Prior** for Better Structure Consistency
|
32 |
+
* Compatible with **DreamBooth and LoRA** for Various Styles
|
33 |
+
* **TensorRT** Supported
|
34 |
+
|
35 |
+
The speed evaluation is conducted on **Ubuntu 20.04.6 LTS** and **Pytorch 2.2.2** with **RTX 4090 GPU** and **Intel(R) Xeon(R) Platinum 8352V CPU**. Denoising steps are set as 2.
|
36 |
+
|
37 |
+
| Resolution | TensorRT | FPS |
|
38 |
+
| :--------: | :------: | :-------: |
|
39 |
+
| 512 x 512 | **On** | **16.43** |
|
40 |
+
| 512 x 512 | Off | 6.91 |
|
41 |
+
| 768 x 512 | **On** | **12.15** |
|
42 |
+
| 768 x 512 | Off | 6.29 |
|
43 |
+
|
44 |
+
## Real-Time Video2Video Demo
|
45 |
+
|
46 |
+
<div align="center">
|
47 |
+
<table align="center">
|
48 |
+
<tbody>
|
49 |
+
<tr align="center">
|
50 |
+
<td>
|
51 |
+
<p> Human Face (Web Camera Input) </p>
|
52 |
+
</td>
|
53 |
+
<td>
|
54 |
+
<p> Anime Character (Screen Video Input) </p>
|
55 |
+
</td>
|
56 |
+
</tr>
|
57 |
+
<tr align="center">
|
58 |
+
<td>
|
59 |
+
<video controls autoplay src="https://github.com/user-attachments/assets/c39e4b1f-e336-479a-af72-d07b1e3c6e30" width="100%">
|
60 |
+
</td>
|
61 |
+
<td>
|
62 |
+
<video controls autoplay src="https://github.com/user-attachments/assets/42727f46-b3cf-48ea-971c-9f653bf5a264" width="80%">
|
63 |
+
</td>
|
64 |
+
</tr>
|
65 |
+
</tbody>
|
66 |
+
</table>
|
67 |
+
|
68 |
+
</div>
|
69 |
+
|
70 |
+
## Acknowledgements
|
71 |
+
|
72 |
+
The video and image demos in this GitHub repository were generated using [LCM-LoRA](https://huggingface.co/latent-consistency/lcm-lora-sdv1-5). Stream batch in [StreamDiffusion](https://github.com/cumulo-autumn/StreamDiffusion) is used for model acceleration. The design of Video Diffusion Model is adopted from [AnimateDiff](https://github.com/guoyww/AnimateDiff). We use a third-party implementation of [MiDaS](https://github.com/lewiji/MiDaS) implementation which support onnx export. Our online demo is modified from [Real-Time-Latent-Consistency-Model](https://github.com/radames/Real-Time-Latent-Consistency-Model/).
|
73 |
+
|
74 |
+
## BibTex
|
75 |
+
|
76 |
+
If you find it helpful, please consider citing our work:
|
77 |
+
|
78 |
+
```bibtex
|
79 |
+
@article{xing2024live2diff,
|
80 |
+
title={Live2Diff: Live Stream Translation via Uni-directional Attention in Video Diffusion Models},
|
81 |
+
author={Zhening Xing and Gereon Fox and Yanhong Zeng and Xingang Pan and Mohamed Elgharib and Christian Theobalt and Kai Chen},
|
82 |
+
booktitle={arXiv preprint arxiv:2407.08701},
|
83 |
+
year={2024}
|
84 |
+
}
|
85 |
+
```
|