Spaces:
Running
Running
kevinwang676
commited on
Commit
•
64db85e
1
Parent(s):
595e4d5
Update README.md
Browse files
README.md
CHANGED
@@ -1,268 +1,10 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
10 |
-
|
11 |
-
<a target='_blank'>Wenxuan Zhang <sup>*,1,2</sup> </a> 
|
12 |
-
<a href='https://vinthony.github.io/' target='_blank'>Xiaodong Cun <sup>*,2</a> 
|
13 |
-
<a href='https://xuanwangvc.github.io/' target='_blank'>Xuan Wang <sup>3</sup></a> 
|
14 |
-
<a href='https://yzhang2016.github.io/' target='_blank'>Yong Zhang <sup>2</sup></a> 
|
15 |
-
<a href='https://xishen0220.github.io/' target='_blank'>Xi Shen <sup>2</sup></a>  </br>
|
16 |
-
<a href='https://yuguo-xjtu.github.io/' target='_blank'>Yu Guo<sup>1</sup> </a> 
|
17 |
-
<a href='https://scholar.google.com/citations?hl=zh-CN&user=4oXBp9UAAAAJ' target='_blank'>Ying Shan <sup>2</sup> </a> 
|
18 |
-
<a target='_blank'>Fei Wang <sup>1</sup> </a> 
|
19 |
-
</div>
|
20 |
-
<br>
|
21 |
-
<div>
|
22 |
-
<sup>1</sup> Xi'an Jiaotong University   <sup>2</sup> Tencent AI Lab   <sup>3</sup> Ant Group  
|
23 |
-
</div>
|
24 |
-
<br>
|
25 |
-
<i><strong><a href='https://arxiv.org/abs/2211.12194' target='_blank'>CVPR 2023</a></strong></i>
|
26 |
-
<br>
|
27 |
-
<br>
|
28 |
-
|
29 |
-
|
30 |
-
![sadtalker](https://user-images.githubusercontent.com/4397546/222490039-b1f6156b-bf00-405b-9fda-0c9a9156f991.gif)
|
31 |
-
|
32 |
-
<b>TL;DR: single portrait image 🙎♂️ + audio 🎤 = talking head video 🎞.</b>
|
33 |
-
|
34 |
-
<br>
|
35 |
-
|
36 |
-
</div>
|
37 |
-
|
38 |
-
|
39 |
-
|
40 |
-
## 🔥 Highlight
|
41 |
-
|
42 |
-
- 🔥 The extension of the [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) is online. Checkout more details [here](docs/webui_extension.md).
|
43 |
-
|
44 |
-
https://user-images.githubusercontent.com/4397546/231495639-5d4bb925-ea64-4a36-a519-6389917dac29.mp4
|
45 |
-
|
46 |
-
- 🔥 `full image mode` is online! checkout [here](https://github.com/Winfredy/SadTalker#full-bodyimage-generation) for more details.
|
47 |
-
|
48 |
-
| still+enhancer in v0.0.1 | still + enhancer in v0.0.2 | [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) |
|
49 |
-
|:--------------------: |:--------------------: | :----: |
|
50 |
-
| <video src="https://user-images.githubusercontent.com/48216707/229484996-5d7be64f-2553-4c9e-a452-c5cf0b8ebafe.mp4" type="video/mp4"> </video> | <video src="https://user-images.githubusercontent.com/4397546/230717873-355b7bf3-d3de-49f9-a439-9220e623fce7.mp4" type="video/mp4"> </video> | <img src='./examples/source_image/full_body_2.png' width='380'>
|
51 |
-
|
52 |
-
- 🔥 Several new mode, eg, `still mode`, `reference mode`, `resize mode` are online for better and custom applications.
|
53 |
-
|
54 |
-
- 🔥 Happy to see more community demos at [bilibili](https://search.bilibili.com/all?keyword=sadtalker&from_source=webtop_search&spm_id_from=333.1007&search_source=3
|
55 |
-
), [Youtube](https://www.youtube.com/results?search_query=sadtalker&sp=CAM%253D) and [twitter #sadtalker](https://twitter.com/search?q=%23sadtalker&src=typed_query).
|
56 |
-
|
57 |
-
## 📋 Changelog (Previous changelog can be founded [here](docs/changlelog.md))
|
58 |
-
|
59 |
-
- __[2023.06.12]__: add more new features in WEBUI extension, see the discussion [here](https://github.com/OpenTalker/SadTalker/discussions/386).
|
60 |
-
|
61 |
-
- __[2023.06.05]__: release a new 512 beta face model. Fixed some bugs and improve the performance.
|
62 |
-
|
63 |
-
- __[2023.04.15]__: Adding automatic1111 colab by @camenduru, thanks for this awesome colab: [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb).
|
64 |
-
|
65 |
-
- __[2023.04.12]__: adding a more detailed sd-webui installation document, fixed reinstallation problem.
|
66 |
-
|
67 |
-
- __[2023.04.12]__: Fixed the sd-webui safe issues becasue of the 3rd packages, optimize the output path in `sd-webui-extension`.
|
68 |
-
|
69 |
-
- __[2023.04.08]__: ❗️❗️❗️ In v0.0.2, we add a logo watermark to the generated video to prevent abusing since it is very realistic.
|
70 |
-
|
71 |
-
- __[2023.04.08]__: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer.
|
72 |
-
|
73 |
-
|
74 |
-
## 🚧 TODO: See the Discussion https://github.com/OpenTalker/SadTalker/issues/280
|
75 |
-
|
76 |
-
## If you have any problem, please view our [FAQ](docs/FAQ.md) before opening an issue.
|
77 |
-
|
78 |
-
|
79 |
-
|
80 |
-
## ⚙️ 1. Installation.
|
81 |
-
|
82 |
-
Tutorials from communities: [中文windows教程](https://www.bilibili.com/video/BV1Dc411W7V6/) | [日本語コース](https://br-d.fanbox.cc/posts/5685086?utm_campaign=manage_post_page&utm_medium=share&utm_source=twitter)
|
83 |
-
|
84 |
-
### Linux:
|
85 |
-
|
86 |
-
1. Installing [anaconda](https://www.anaconda.com/), python and git.
|
87 |
-
|
88 |
-
2. Creating the env and install the requirements.
|
89 |
-
```bash
|
90 |
-
git clone https://github.com/Winfredy/SadTalker.git
|
91 |
-
|
92 |
-
cd SadTalker
|
93 |
-
|
94 |
-
conda create -n sadtalker python=3.8
|
95 |
-
|
96 |
-
conda activate sadtalker
|
97 |
-
|
98 |
-
pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
|
99 |
-
|
100 |
-
conda install ffmpeg
|
101 |
-
|
102 |
-
pip install -r requirements.txt
|
103 |
-
|
104 |
-
### tts is optional for gradio demo.
|
105 |
-
### pip install TTS
|
106 |
-
|
107 |
-
```
|
108 |
-
### Windows ([中文windows教程](https://www.bilibili.com/video/BV1Dc411W7V6/)):
|
109 |
-
|
110 |
-
1. Install [Python 3.10.6](https://www.python.org/downloads/windows/), checking "Add Python to PATH".
|
111 |
-
2. Install [git](https://git-scm.com/download/win) manually (OR `scoop install git` via [scoop](https://scoop.sh/)).
|
112 |
-
3. Install `ffmpeg`, following [this instruction](https://www.wikihow.com/Install-FFmpeg-on-Windows) (OR using `scoop install ffmpeg` via [scoop](https://scoop.sh/)).
|
113 |
-
4. Download our SadTalker repository, for example by running `git clone https://github.com/Winfredy/SadTalker.git`.
|
114 |
-
5. Download the `checkpoint` and `gfpgan` [below↓](https://github.com/Winfredy/SadTalker#-2-download-trained-models).
|
115 |
-
5. Run `start.bat` from Windows Explorer as normal, non-administrator, user, a gradio WebUI demo will be started.
|
116 |
-
|
117 |
-
### Macbook:
|
118 |
-
|
119 |
-
More tips about installnation on Macbook and the Docker file can be founded [here](docs/install.md)
|
120 |
-
|
121 |
-
## 📥 2. Download Trained Models.
|
122 |
-
|
123 |
-
You can run the following script to put all the models in the right place.
|
124 |
-
|
125 |
-
```bash
|
126 |
-
bash scripts/download_models.sh
|
127 |
-
```
|
128 |
-
|
129 |
-
Other alternatives:
|
130 |
-
> we also provide an offline patch (`gfpgan/`), thus, no model will be downloaded when generating.
|
131 |
-
|
132 |
-
**Google Driver**: download our pre-trained model from [ this link (main checkpoints)](https://drive.google.com/file/d/1gwWh45pF7aelNP_P78uDJL8Sycep-K7j/view?usp=sharing) and [ gfpgan (offline patch)](https://drive.google.com/file/d/19AIBsmfcHW6BRJmeqSFlG5fL445Xmsyi?usp=sharing)
|
133 |
-
|
134 |
-
**Github Release Page**: download all the files from the [lastest github release page](https://github.com/Winfredy/SadTalker/releases), and then, put it in ./checkpoints.
|
135 |
-
|
136 |
-
**百度云盘**: we provided the downloaded model in [checkpoints, 提取码: sadt.](https://pan.baidu.com/s/1P4fRgk9gaSutZnn8YW034Q?pwd=sadt) And [gfpgan, 提取码: sadt.](https://pan.baidu.com/s/1kb1BCPaLOWX1JJb9Czbn6w?pwd=sadt)
|
137 |
-
|
138 |
-
|
139 |
-
|
140 |
-
<details><summary>Model Details</summary>
|
141 |
-
|
142 |
-
|
143 |
-
Model explains:
|
144 |
-
|
145 |
-
##### New version
|
146 |
-
| Model | Description
|
147 |
-
| :--- | :----------
|
148 |
-
|checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
|
149 |
-
|checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker.
|
150 |
-
|checkpoints/SadTalker_V0.0.2_256.safetensors | packaged sadtalker checkpoints of old version, 256 face render).
|
151 |
-
|checkpoints/SadTalker_V0.0.2_512.safetensors | packaged sadtalker checkpoints of old version, 512 face render).
|
152 |
-
|gfpgan/weights | Face detection and enhanced models used in `facexlib` and `gfpgan`.
|
153 |
-
|
154 |
-
|
155 |
-
##### Old version
|
156 |
-
| Model | Description
|
157 |
-
| :--- | :----------
|
158 |
-
|checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker.
|
159 |
-
|checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker.
|
160 |
-
|checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
|
161 |
-
|checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker.
|
162 |
-
|checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from [the reappearance of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis).
|
163 |
-
|checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction).
|
164 |
-
|checkpoints/wav2lip.pth | Highly accurate lip-sync model in [Wav2lip](https://github.com/Rudrabha/Wav2Lip).
|
165 |
-
|checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in [dilb](http://dlib.net/).
|
166 |
-
|checkpoints/BFM | 3DMM library file.
|
167 |
-
|checkpoints/hub | Face detection models used in [face alignment](https://github.com/1adrianb/face-alignment).
|
168 |
-
|gfpgan/weights | Face detection and enhanced models used in `facexlib` and `gfpgan`.
|
169 |
-
|
170 |
-
The final folder will be shown as:
|
171 |
-
|
172 |
-
<img width="331" alt="image" src="https://user-images.githubusercontent.com/4397546/232511411-4ca75cbf-a434-48c5-9ae0-9009e8316484.png">
|
173 |
-
|
174 |
-
|
175 |
-
</details>
|
176 |
-
|
177 |
-
## 🔮 3. Quick Start ([Best Practice](docs/best_practice.md)).
|
178 |
-
|
179 |
-
### WebUI Demos:
|
180 |
-
|
181 |
-
**Online**: [Huggingface](https://huggingface.co/spaces/vinthony/SadTalker) | [SDWebUI-Colab](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) | [Colab](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)
|
182 |
-
|
183 |
-
**Local Autiomatic1111 stable-diffusion webui extension**: please refer to [Autiomatic1111 stable-diffusion webui docs](docs/webui_extension.md).
|
184 |
-
|
185 |
-
**Local gradio demo(highly recommanded!)**: Similar to our [hugging-face demo](https://huggingface.co/spaces/vinthony/SadTalker) can be run by:
|
186 |
-
|
187 |
-
```bash
|
188 |
-
## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
|
189 |
-
python app_sadtalker.py
|
190 |
-
```
|
191 |
-
|
192 |
-
**Local gradio demo(highly recommanded!)**:
|
193 |
-
|
194 |
-
- windows: just double click `webui.bat`, the requirements will be installed automatically.
|
195 |
-
- Linux/Mac OS: run `bash webui.sh` to start the webui.
|
196 |
-
|
197 |
-
|
198 |
-
### Manually usages:
|
199 |
-
|
200 |
-
##### Animating a portrait image from default config:
|
201 |
-
```bash
|
202 |
-
python inference.py --driven_audio <audio.wav> \
|
203 |
-
--source_image <video.mp4 or picture.png> \
|
204 |
-
--enhancer gfpgan
|
205 |
-
```
|
206 |
-
The results will be saved in `results/$SOME_TIMESTAMP/*.mp4`.
|
207 |
-
|
208 |
-
##### Full body/image Generation:
|
209 |
-
|
210 |
-
Using `--still` to generate a natural full body video. You can add `enhancer` to improve the quality of the generated video.
|
211 |
-
|
212 |
-
```bash
|
213 |
-
python inference.py --driven_audio <audio.wav> \
|
214 |
-
--source_image <video.mp4 or picture.png> \
|
215 |
-
--result_dir <a file to store results> \
|
216 |
-
--still \
|
217 |
-
--preprocess full \
|
218 |
-
--enhancer gfpgan
|
219 |
-
```
|
220 |
-
|
221 |
-
More examples and configuration and tips can be founded in the [ >>> best practice documents <<<](docs/best_practice.md).
|
222 |
-
|
223 |
-
## 🛎 Citation
|
224 |
-
|
225 |
-
If you find our work useful in your research, please consider citing:
|
226 |
-
|
227 |
-
```bibtex
|
228 |
-
@article{zhang2022sadtalker,
|
229 |
-
title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
|
230 |
-
author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
|
231 |
-
journal={arXiv preprint arXiv:2211.12194},
|
232 |
-
year={2022}
|
233 |
-
}
|
234 |
-
```
|
235 |
-
|
236 |
-
|
237 |
-
|
238 |
-
## 💗 Acknowledgements
|
239 |
-
|
240 |
-
Facerender code borrows heavily from [zhanglonghao's reproduction of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis) and [PIRender](https://github.com/RenYurui/PIRender). We thank the authors for sharing their wonderful code. In training process, We also use the model from [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction) and [Wav2lip](https://github.com/Rudrabha/Wav2Lip). We thank for their wonderful work.
|
241 |
-
|
242 |
-
See also these wonderful 3rd libraries we use:
|
243 |
-
|
244 |
-
- **Face Utils**: https://github.com/xinntao/facexlib
|
245 |
-
- **Face Enhancement**: https://github.com/TencentARC/GFPGAN
|
246 |
-
- **Image/Video Enhancement**:https://github.com/xinntao/Real-ESRGAN
|
247 |
-
|
248 |
-
## 🥂 Extensions:
|
249 |
-
|
250 |
-
- [SadTalker-Video-Lip-Sync](https://github.com/Zz-ww/SadTalker-Video-Lip-Sync) from [@Zz-ww](https://github.com/Zz-ww): SadTalker for Video Lip Editing
|
251 |
-
|
252 |
-
## 🥂 Related Works
|
253 |
-
- [StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)](https://github.com/FeiiYin/StyleHEAT)
|
254 |
-
- [CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)](https://github.com/Doubiiu/CodeTalker)
|
255 |
-
- [VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)](https://github.com/vinthony/video-retalking)
|
256 |
-
- [DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)](https://github.com/Carlyx/DPE)
|
257 |
-
- [3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)](https://github.com/FeiiYin/SPI/)
|
258 |
-
- [T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)](https://github.com/Mael-zys/T2M-GPT)
|
259 |
-
|
260 |
-
## 📢 Disclaimer
|
261 |
-
|
262 |
-
This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.
|
263 |
-
|
264 |
-
LOGO: color and font suggestion: [ChatGPT](ai.com), logo font:[Montserrat Alternates
|
265 |
-
](https://fonts.google.com/specimen/Montserrat+Alternates?preview.text=SadTalker&preview.text_type=custom&query=mont).
|
266 |
-
|
267 |
-
All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.
|
268 |
-
|
|
|
1 |
+
title: VoiceChange
|
2 |
+
emoji: 👀
|
3 |
+
colorFrom: blue
|
4 |
+
colorTo: purple
|
5 |
+
sdk: gradio
|
6 |
+
sdk_version: 3.28.3
|
7 |
+
app_file: app_multi.py
|
8 |
+
pinned: false
|
9 |
+
license: mit
|
10 |
+
duplicated_from: BartPoint/VoiceChange
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|