kevinwang676 commited on
Commit
64db85e
1 Parent(s): 595e4d5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -268
README.md CHANGED
@@ -1,268 +1,10 @@
1
- <div align="center">
2
-
3
- <img src='https://user-images.githubusercontent.com/4397546/229094115-862c747e-7397-4b54-ba4a-bd368bfe2e0f.png' width='500px'/>
4
-
5
-
6
- <!--<h2> 😭 SadTalker: <span style="font-size:12px">Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation </span> </h2> -->
7
-
8
- <a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-PDF-red'></a> &nbsp; <a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp; [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) &nbsp; [![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/vinthony/SadTalker) &nbsp; [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) &nbsp; [![Replicate](https://replicate.com/cjwbw/sadtalker/badge)](https://replicate.com/cjwbw/sadtalker)
9
-
10
- <div>
11
- <a target='_blank'>Wenxuan Zhang <sup>*,1,2</sup> </a>&emsp;
12
- <a href='https://vinthony.github.io/' target='_blank'>Xiaodong Cun <sup>*,2</a>&emsp;
13
- <a href='https://xuanwangvc.github.io/' target='_blank'>Xuan Wang <sup>3</sup></a>&emsp;
14
- <a href='https://yzhang2016.github.io/' target='_blank'>Yong Zhang <sup>2</sup></a>&emsp;
15
- <a href='https://xishen0220.github.io/' target='_blank'>Xi Shen <sup>2</sup></a>&emsp; </br>
16
- <a href='https://yuguo-xjtu.github.io/' target='_blank'>Yu Guo<sup>1</sup> </a>&emsp;
17
- <a href='https://scholar.google.com/citations?hl=zh-CN&user=4oXBp9UAAAAJ' target='_blank'>Ying Shan <sup>2</sup> </a>&emsp;
18
- <a target='_blank'>Fei Wang <sup>1</sup> </a>&emsp;
19
- </div>
20
- <br>
21
- <div>
22
- <sup>1</sup> Xi'an Jiaotong University &emsp; <sup>2</sup> Tencent AI Lab &emsp; <sup>3</sup> Ant Group &emsp;
23
- </div>
24
- <br>
25
- <i><strong><a href='https://arxiv.org/abs/2211.12194' target='_blank'>CVPR 2023</a></strong></i>
26
- <br>
27
- <br>
28
-
29
-
30
- ![sadtalker](https://user-images.githubusercontent.com/4397546/222490039-b1f6156b-bf00-405b-9fda-0c9a9156f991.gif)
31
-
32
- <b>TL;DR: &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; single portrait image 🙎‍♂️ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;+ &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; audio 🎤 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; = &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; talking head video 🎞.</b>
33
-
34
- <br>
35
-
36
- </div>
37
-
38
-
39
-
40
- ## 🔥 Highlight
41
-
42
- - 🔥 The extension of the [stable-diffusion-webui](https://github.com/AUTOMATIC1111/stable-diffusion-webui) is online. Checkout more details [here](docs/webui_extension.md).
43
-
44
- https://user-images.githubusercontent.com/4397546/231495639-5d4bb925-ea64-4a36-a519-6389917dac29.mp4
45
-
46
- - 🔥 `full image mode` is online! checkout [here](https://github.com/Winfredy/SadTalker#full-bodyimage-generation) for more details.
47
-
48
- | still+enhancer in v0.0.1 | still + enhancer in v0.0.2 | [input image @bagbag1815](https://twitter.com/bagbag1815/status/1642754319094108161) |
49
- |:--------------------: |:--------------------: | :----: |
50
- | <video src="https://user-images.githubusercontent.com/48216707/229484996-5d7be64f-2553-4c9e-a452-c5cf0b8ebafe.mp4" type="video/mp4"> </video> | <video src="https://user-images.githubusercontent.com/4397546/230717873-355b7bf3-d3de-49f9-a439-9220e623fce7.mp4" type="video/mp4"> </video> | <img src='./examples/source_image/full_body_2.png' width='380'>
51
-
52
- - 🔥 Several new mode, eg, `still mode`, `reference mode`, `resize mode` are online for better and custom applications.
53
-
54
- - 🔥 Happy to see more community demos at [bilibili](https://search.bilibili.com/all?keyword=sadtalker&from_source=webtop_search&spm_id_from=333.1007&search_source=3
55
- ), [Youtube](https://www.youtube.com/results?search_query=sadtalker&sp=CAM%253D) and [twitter #sadtalker](https://twitter.com/search?q=%23sadtalker&src=typed_query).
56
-
57
- ## 📋 Changelog (Previous changelog can be founded [here](docs/changlelog.md))
58
-
59
- - __[2023.06.12]__: add more new features in WEBUI extension, see the discussion [here](https://github.com/OpenTalker/SadTalker/discussions/386).
60
-
61
- - __[2023.06.05]__: release a new 512 beta face model. Fixed some bugs and improve the performance.
62
-
63
- - __[2023.04.15]__: Adding automatic1111 colab by @camenduru, thanks for this awesome colab: [![sd webui-colab](https://img.shields.io/badge/Automatic1111-Colab-green)](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb).
64
-
65
- - __[2023.04.12]__: adding a more detailed sd-webui installation document, fixed reinstallation problem.
66
-
67
- - __[2023.04.12]__: Fixed the sd-webui safe issues becasue of the 3rd packages, optimize the output path in `sd-webui-extension`.
68
-
69
- - __[2023.04.08]__: ❗️❗️❗️ In v0.0.2, we add a logo watermark to the generated video to prevent abusing since it is very realistic.
70
-
71
- - __[2023.04.08]__: v0.0.2, full image animation, adding baidu driver for download checkpoints. Optimizing the logic about enhancer.
72
-
73
-
74
- ## 🚧 TODO: See the Discussion https://github.com/OpenTalker/SadTalker/issues/280
75
-
76
- ## If you have any problem, please view our [FAQ](docs/FAQ.md) before opening an issue.
77
-
78
-
79
-
80
- ## ⚙️ 1. Installation.
81
-
82
- Tutorials from communities: [中文windows教程](https://www.bilibili.com/video/BV1Dc411W7V6/) | [日本語コース](https://br-d.fanbox.cc/posts/5685086?utm_campaign=manage_post_page&utm_medium=share&utm_source=twitter)
83
-
84
- ### Linux:
85
-
86
- 1. Installing [anaconda](https://www.anaconda.com/), python and git.
87
-
88
- 2. Creating the env and install the requirements.
89
- ```bash
90
- git clone https://github.com/Winfredy/SadTalker.git
91
-
92
- cd SadTalker
93
-
94
- conda create -n sadtalker python=3.8
95
-
96
- conda activate sadtalker
97
-
98
- pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
99
-
100
- conda install ffmpeg
101
-
102
- pip install -r requirements.txt
103
-
104
- ### tts is optional for gradio demo.
105
- ### pip install TTS
106
-
107
- ```
108
- ### Windows ([中文windows教程](https://www.bilibili.com/video/BV1Dc411W7V6/)):
109
-
110
- 1. Install [Python 3.10.6](https://www.python.org/downloads/windows/), checking "Add Python to PATH".
111
- 2. Install [git](https://git-scm.com/download/win) manually (OR `scoop install git` via [scoop](https://scoop.sh/)).
112
- 3. Install `ffmpeg`, following [this instruction](https://www.wikihow.com/Install-FFmpeg-on-Windows) (OR using `scoop install ffmpeg` via [scoop](https://scoop.sh/)).
113
- 4. Download our SadTalker repository, for example by running `git clone https://github.com/Winfredy/SadTalker.git`.
114
- 5. Download the `checkpoint` and `gfpgan` [below↓](https://github.com/Winfredy/SadTalker#-2-download-trained-models).
115
- 5. Run `start.bat` from Windows Explorer as normal, non-administrator, user, a gradio WebUI demo will be started.
116
-
117
- ### Macbook:
118
-
119
- More tips about installnation on Macbook and the Docker file can be founded [here](docs/install.md)
120
-
121
- ## 📥 2. Download Trained Models.
122
-
123
- You can run the following script to put all the models in the right place.
124
-
125
- ```bash
126
- bash scripts/download_models.sh
127
- ```
128
-
129
- Other alternatives:
130
- > we also provide an offline patch (`gfpgan/`), thus, no model will be downloaded when generating.
131
-
132
- **Google Driver**: download our pre-trained model from [ this link (main checkpoints)](https://drive.google.com/file/d/1gwWh45pF7aelNP_P78uDJL8Sycep-K7j/view?usp=sharing) and [ gfpgan (offline patch)](https://drive.google.com/file/d/19AIBsmfcHW6BRJmeqSFlG5fL445Xmsyi?usp=sharing)
133
-
134
- **Github Release Page**: download all the files from the [lastest github release page](https://github.com/Winfredy/SadTalker/releases), and then, put it in ./checkpoints.
135
-
136
- **百度云盘**: we provided the downloaded model in [checkpoints, 提取码: sadt.](https://pan.baidu.com/s/1P4fRgk9gaSutZnn8YW034Q?pwd=sadt) And [gfpgan, 提取码: sadt.](https://pan.baidu.com/s/1kb1BCPaLOWX1JJb9Czbn6w?pwd=sadt)
137
-
138
-
139
-
140
- <details><summary>Model Details</summary>
141
-
142
-
143
- Model explains:
144
-
145
- ##### New version
146
- | Model | Description
147
- | :--- | :----------
148
- |checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
149
- |checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker.
150
- |checkpoints/SadTalker_V0.0.2_256.safetensors | packaged sadtalker checkpoints of old version, 256 face render).
151
- |checkpoints/SadTalker_V0.0.2_512.safetensors | packaged sadtalker checkpoints of old version, 512 face render).
152
- |gfpgan/weights | Face detection and enhanced models used in `facexlib` and `gfpgan`.
153
-
154
-
155
- ##### Old version
156
- | Model | Description
157
- | :--- | :----------
158
- |checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker.
159
- |checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker.
160
- |checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
161
- |checkpoints/mapping_00109-model.pth.tar | Pre-trained MappingNet in Sadtalker.
162
- |checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from [the reappearance of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis).
163
- |checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction).
164
- |checkpoints/wav2lip.pth | Highly accurate lip-sync model in [Wav2lip](https://github.com/Rudrabha/Wav2Lip).
165
- |checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in [dilb](http://dlib.net/).
166
- |checkpoints/BFM | 3DMM library file.
167
- |checkpoints/hub | Face detection models used in [face alignment](https://github.com/1adrianb/face-alignment).
168
- |gfpgan/weights | Face detection and enhanced models used in `facexlib` and `gfpgan`.
169
-
170
- The final folder will be shown as:
171
-
172
- <img width="331" alt="image" src="https://user-images.githubusercontent.com/4397546/232511411-4ca75cbf-a434-48c5-9ae0-9009e8316484.png">
173
-
174
-
175
- </details>
176
-
177
- ## 🔮 3. Quick Start ([Best Practice](docs/best_practice.md)).
178
-
179
- ### WebUI Demos:
180
-
181
- **Online**: [Huggingface](https://huggingface.co/spaces/vinthony/SadTalker) | [SDWebUI-Colab](https://colab.research.google.com/github/camenduru/stable-diffusion-webui-colab/blob/main/video/stable/stable_diffusion_1_5_video_webui_colab.ipynb) | [Colab](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)
182
-
183
- **Local Autiomatic1111 stable-diffusion webui extension**: please refer to [Autiomatic1111 stable-diffusion webui docs](docs/webui_extension.md).
184
-
185
- **Local gradio demo(highly recommanded!)**: Similar to our [hugging-face demo](https://huggingface.co/spaces/vinthony/SadTalker) can be run by:
186
-
187
- ```bash
188
- ## you need manually install TTS(https://github.com/coqui-ai/TTS) via `pip install tts` in advanced.
189
- python app_sadtalker.py
190
- ```
191
-
192
- **Local gradio demo(highly recommanded!)**:
193
-
194
- - windows: just double click `webui.bat`, the requirements will be installed automatically.
195
- - Linux/Mac OS: run `bash webui.sh` to start the webui.
196
-
197
-
198
- ### Manually usages:
199
-
200
- ##### Animating a portrait image from default config:
201
- ```bash
202
- python inference.py --driven_audio <audio.wav> \
203
- --source_image <video.mp4 or picture.png> \
204
- --enhancer gfpgan
205
- ```
206
- The results will be saved in `results/$SOME_TIMESTAMP/*.mp4`.
207
-
208
- ##### Full body/image Generation:
209
-
210
- Using `--still` to generate a natural full body video. You can add `enhancer` to improve the quality of the generated video.
211
-
212
- ```bash
213
- python inference.py --driven_audio <audio.wav> \
214
- --source_image <video.mp4 or picture.png> \
215
- --result_dir <a file to store results> \
216
- --still \
217
- --preprocess full \
218
- --enhancer gfpgan
219
- ```
220
-
221
- More examples and configuration and tips can be founded in the [ >>> best practice documents <<<](docs/best_practice.md).
222
-
223
- ## 🛎 Citation
224
-
225
- If you find our work useful in your research, please consider citing:
226
-
227
- ```bibtex
228
- @article{zhang2022sadtalker,
229
- title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
230
- author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
231
- journal={arXiv preprint arXiv:2211.12194},
232
- year={2022}
233
- }
234
- ```
235
-
236
-
237
-
238
- ## 💗 Acknowledgements
239
-
240
- Facerender code borrows heavily from [zhanglonghao's reproduction of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis) and [PIRender](https://github.com/RenYurui/PIRender). We thank the authors for sharing their wonderful code. In training process, We also use the model from [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction) and [Wav2lip](https://github.com/Rudrabha/Wav2Lip). We thank for their wonderful work.
241
-
242
- See also these wonderful 3rd libraries we use:
243
-
244
- - **Face Utils**: https://github.com/xinntao/facexlib
245
- - **Face Enhancement**: https://github.com/TencentARC/GFPGAN
246
- - **Image/Video Enhancement**:https://github.com/xinntao/Real-ESRGAN
247
-
248
- ## 🥂 Extensions:
249
-
250
- - [SadTalker-Video-Lip-Sync](https://github.com/Zz-ww/SadTalker-Video-Lip-Sync) from [@Zz-ww](https://github.com/Zz-ww): SadTalker for Video Lip Editing
251
-
252
- ## 🥂 Related Works
253
- - [StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)](https://github.com/FeiiYin/StyleHEAT)
254
- - [CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)](https://github.com/Doubiiu/CodeTalker)
255
- - [VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)](https://github.com/vinthony/video-retalking)
256
- - [DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)](https://github.com/Carlyx/DPE)
257
- - [3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)](https://github.com/FeiiYin/SPI/)
258
- - [T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)](https://github.com/Mael-zys/T2M-GPT)
259
-
260
- ## 📢 Disclaimer
261
-
262
- This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.
263
-
264
- LOGO: color and font suggestion: [ChatGPT](ai.com), logo font:[Montserrat Alternates
265
- ](https://fonts.google.com/specimen/Montserrat+Alternates?preview.text=SadTalker&preview.text_type=custom&query=mont).
266
-
267
- All the copyright of the demo images and audio are from communities users or the geneartion from stable diffusion. Free free to contact us if you feel uncomfortable.
268
-
 
1
+ title: VoiceChange
2
+ emoji: 👀
3
+ colorFrom: blue
4
+ colorTo: purple
5
+ sdk: gradio
6
+ sdk_version: 3.28.3
7
+ app_file: app_multi.py
8
+ pinned: false
9
+ license: mit
10
+ duplicated_from: BartPoint/VoiceChange