vinthony commited on
Commit
aa0d6ca
1 Parent(s): a22eb82
Files changed (1) hide show
  1. README.md +13 -194
README.md CHANGED
@@ -1,194 +1,13 @@
1
- <div align="center">
2
-
3
- <h2> 😭 SadTalker: <span style="font-size:12px">Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation </span> </h2>
4
-
5
- <a href='https://arxiv.org/abs/2211.12194'><img src='https://img.shields.io/badge/ArXiv-2211.14758-red'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<a href='https://sadtalker.github.io'><img src='https://img.shields.io/badge/Project-Page-Green'></a> &nbsp;&nbsp;&nbsp;&nbsp;&nbsp; [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb)
6
-
7
- <div>
8
- <a target='_blank'>Wenxuan Zhang <sup>*,1,2</sup> </a>&emsp;
9
- <a href='https://vinthony.github.io/' target='_blank'>Xiaodong Cun <sup>*,2</a>&emsp;
10
- <a href='https://xuanwangvc.github.io/' target='_blank'>Xuan Wang <sup>3</sup></a>&emsp;
11
- <a href='https://yzhang2016.github.io/' target='_blank'>Yong Zhang <sup>2</sup></a>&emsp;
12
- <a href='https://xishen0220.github.io/' target='_blank'>Xi Shen <sup>2</sup></a>&emsp; </br>
13
- <a href='https://yuguo-xjtu.github.io/' target='_blank'>Yu Guo<sup>1</sup> </a>&emsp;
14
- <a href='https://scholar.google.com/citations?hl=zh-CN&user=4oXBp9UAAAAJ' target='_blank'>Ying Shan <sup>2</sup> </a>&emsp;
15
- <a target='_blank'>Fei Wang <sup>1</sup> </a>&emsp;
16
- </div>
17
- <br>
18
- <div>
19
- <sup>1</sup> Xi'an Jiaotong University &emsp; <sup>2</sup> Tencent AI Lab &emsp; <sup>3</sup> Ant Group &emsp;
20
- </div>
21
- <br>
22
- <i><strong><a href='https://arxiv.org/abs/2211.12194' target='_blank'>CVPR 2023</a></strong></i>
23
- <br>
24
- <br>
25
-
26
- ![sadtalker](https://user-images.githubusercontent.com/4397546/222490039-b1f6156b-bf00-405b-9fda-0c9a9156f991.gif)
27
-
28
- <b>TL;DR: A realistic and stylized talking head video generation method from a single image and audio.</b>
29
-
30
- <br>
31
-
32
- </div>
33
-
34
-
35
- ## 📋 Changelog
36
-
37
-
38
- - __2023.03.22__: Launch new feature: generating the 3d face animation from a single image. New applications about it will be updated.
39
-
40
- - __2023.03.22__: Launch new feature: `still mode`, where only a small head pose will be produced via `python inference.py --still`.
41
- - __2023.03.18__: Support `expression intensity`, now you can change the intensity of the generated motion: `python inference.py --expression_scale 1.3 (some value > 1)`.
42
-
43
- - __2023.03.18__: Reconfig the data folders, now you can download the checkpoint automatically using `bash scripts/download_models.sh`.
44
- - __2023.03.18__: We have offically integrate the [GFPGAN](https://github.com/TencentARC/GFPGAN) for face enhancement, using `python inference.py --enhancer gfpgan` for better visualization performance.
45
- - __2023.03.14__: Specify the version of package `joblib` to remove the errors in using `librosa`, [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/Winfredy/SadTalker/blob/main/quick_demo.ipynb) is online!
46
- &nbsp;&nbsp;&nbsp;&nbsp; <details><summary> Previous Changelogs</summary>
47
- - 2023.03.06 Solve some bugs in code and errors in installation
48
- - 2023.03.03 Release the test code for audio-driven single image animation!
49
- - 2023.02.28 SadTalker has been accepted by CVPR 2023!
50
-
51
- </details>
52
-
53
- ## 🎼 Pipeline
54
- ![main_of_sadtalker](https://user-images.githubusercontent.com/4397546/222490596-4c8a2115-49a7-42ad-a2c3-3bb3288a5f36.png)
55
-
56
-
57
- ## 🚧 TODO
58
-
59
- - [x] Generating 2D face from a single Image.
60
- - [x] Generating 3D face from Audio.
61
- - [x] Generating 4D free-view talking examples from audio and a single image.
62
- - [x] Gradio/Colab Demo.
63
- - [ ] Full body/image Generation.
64
- - [ ] training code of each componments.
65
- - [ ] Audio-driven Anime Avatar.
66
- - [ ] interpolate ChatGPT for a conversation demo 🤔
67
- - [ ] integrade with stable-diffusion-web-ui. (stay tunning!)
68
-
69
- https://user-images.githubusercontent.com/4397546/222513483-89161f58-83d0-40e4-8e41-96c32b47bd4e.mp4
70
-
71
-
72
- ## 🔮 Inference Demo!
73
-
74
- #### Dependence Installation
75
-
76
- <details><summary>CLICK ME</summary>
77
-
78
- ```
79
- git clone https://github.com/Winfredy/SadTalker.git
80
- cd SadTalker
81
- conda create -n sadtalker python=3.8
82
- source activate sadtalker
83
- pip install torch==1.12.1+cu113 torchvision==0.13.1+cu113 torchaudio==0.12.1 --extra-index-url https://download.pytorch.org/whl/cu113
84
- conda install ffmpeg
85
- pip install dlib-bin # [dlib-bin is much faster than dlib installation] conda install dlib
86
- pip install -r requirements.txt
87
-
88
- ### install gpfgan for enhancer
89
- pip install git+https://github.com/TencentARC/GFPGAN
90
-
91
- ```
92
-
93
- </details>
94
-
95
- #### Trained Models
96
- <details><summary>CLICK ME</summary>
97
-
98
- You can run the following script to put all the models in the right place.
99
-
100
- ```bash
101
- bash scripts/download_models.sh
102
- ```
103
-
104
- OR download our pre-trained model from [google drive](https://drive.google.com/drive/folders/1Wd88VDoLhVzYsQ30_qDVluQr_Xm46yHT?usp=sharing) or our [github release page](https://github.com/Winfredy/SadTalker/releases/tag/v0.0.1), and then, put it in ./checkpoints.
105
-
106
- | Model | Description
107
- | :--- | :----------
108
- |checkpoints/auido2exp_00300-model.pth | Pre-trained ExpNet in Sadtalker.
109
- |checkpoints/auido2pose_00140-model.pth | Pre-trained PoseVAE in Sadtalker.
110
- |checkpoints/mapping_00229-model.pth.tar | Pre-trained MappingNet in Sadtalker.
111
- |checkpoints/facevid2vid_00189-model.pth.tar | Pre-trained face-vid2vid model from [the reappearance of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis).
112
- |checkpoints/epoch_20.pth | Pre-trained 3DMM extractor in [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction).
113
- |checkpoints/wav2lip.pth | Highly accurate lip-sync model in [Wav2lip](https://github.com/Rudrabha/Wav2Lip).
114
- |checkpoints/shape_predictor_68_face_landmarks.dat | Face landmark model used in [dilb](http://dlib.net/).
115
- |checkpoints/BFM | 3DMM library file.
116
- |checkpoints/hub | Face detection models used in [face alignment](https://github.com/1adrianb/face-alignment).
117
-
118
- </details>
119
-
120
- #### Generating 2D face from a single Image
121
-
122
- ```bash
123
- python inference.py --driven_audio <audio.wav> \
124
- --source_image <video.mp4 or picture.png> \
125
- --batch_size <default equals 2, a larger run faster> \
126
- --expression_scale <default is 1.0, a larger value will make the motion stronger> \
127
- --result_dir <a file to store results> \
128
- --enhancer <default is None, you can choose gfpgan or RestoreFormer>
129
- ```
130
-
131
- <!-- ###### The effectness of enhancer `gfpgan`. -->
132
-
133
- | basic | w/ still mode | w/ exp_scale 1.3 | w/ gfpgan |
134
- |:-------------: |:-------------: |:-------------: |:-------------: |
135
- | <video src="https://user-images.githubusercontent.com/4397546/226097707-bef1dd41-403e-48d3-a6e6-6adf923843af.mp4"></video> | <video src='https://user-images.githubusercontent.com/4397546/226804933-b717229f-1919-4bd5-b6af-bea7ab66cad3.mp4'></video> | <video style='width:256px' src="https://user-images.githubusercontent.com/4397546/226806013-7752c308-8235-4e7a-9465-72d8fc1aa03d.mp4"></video> | <video style='width:256px' src="https://user-images.githubusercontent.com/4397546/226097717-12a1a2a1-ac0f-428d-b2cb-bd6917aff73e.mp4"></video> |
136
-
137
- > Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.
138
-
139
-
140
- <!-- <video src="./docs/art_0##japanese_still.mp4"></video> -->
141
-
142
-
143
- #### Generating 3D face from Audio
144
-
145
-
146
- | Input | Animated 3d face |
147
- |:-------------: | :-------------: |
148
- | <img src='examples/source_image/art_0.png' width='200px'> | <video src="https://user-images.githubusercontent.com/4397546/226856847-5a6a0a4d-a5ec-49e2-9b05-3206db65e8e3.mp4"></video> |
149
-
150
- > Kindly ensure to activate the audio as the default audio playing is incompatible with GitHub.
151
-
152
- More details to generate the 3d face can be founded [here](docs/face3d.md)
153
-
154
- #### Generating 4D free-view talking examples from audio and a single image
155
-
156
- We use `camera_yaw`, `camera_pitch`, `camera_roll` to control camera pose. For example, `--camera_yaw -20 30 10` means the camera yaw degree changes from -20 to 30 and then changes from 30 to 10.
157
- ```bash
158
- python inference.py --driven_audio <audio.wav> \
159
- --source_image <video.mp4 or picture.png> \
160
- --result_dir <a file to store results> \
161
- --camera_yaw -20 30 10
162
- ```
163
- ![free_view](https://github.com/Winfredy/SadTalker/blob/main/docs/free_view_result.gif)
164
-
165
-
166
- ## 🛎 Citation
167
-
168
- If you find our work useful in your research, please consider citing:
169
-
170
- ```bibtex
171
- @article{zhang2022sadtalker,
172
- title={SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation},
173
- author={Zhang, Wenxuan and Cun, Xiaodong and Wang, Xuan and Zhang, Yong and Shen, Xi and Guo, Yu and Shan, Ying and Wang, Fei},
174
- journal={arXiv preprint arXiv:2211.12194},
175
- year={2022}
176
- }
177
- ```
178
-
179
- ## 💗 Acknowledgements
180
-
181
- Facerender code borrows heavily from [zhanglonghao's reproduction of face-vid2vid](https://github.com/zhanglonghao1992/One-Shot_Free-View_Neural_Talking_Head_Synthesis) and [PIRender](https://github.com/RenYurui/PIRender). We thank the authors for sharing their wonderful code. In training process, We also use the model from [Deep3DFaceReconstruction](https://github.com/microsoft/Deep3DFaceReconstruction) and [Wav2lip](https://github.com/Rudrabha/Wav2Lip). We thank for their wonderful work.
182
-
183
-
184
- ## 🥂 Related Works
185
- - [StyleHEAT: One-Shot High-Resolution Editable Talking Face Generation via Pre-trained StyleGAN (ECCV 2022)](https://github.com/FeiiYin/StyleHEAT)
186
- - [CodeTalker: Speech-Driven 3D Facial Animation with Discrete Motion Prior (CVPR 2023)](https://github.com/Doubiiu/CodeTalker)
187
- - [VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild (SIGGRAPH Asia 2022)](https://github.com/vinthony/video-retalking)
188
- - [DPE: Disentanglement of Pose and Expression for General Video Portrait Editing (CVPR 2023)](https://github.com/Carlyx/DPE)
189
- - [3D GAN Inversion with Facial Symmetry Prior (CVPR 2023)](https://github.com/FeiiYin/SPI/)
190
- - [T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations (CVPR 2023)](https://github.com/Mael-zys/T2M-GPT)
191
-
192
- ## 📢 Disclaimer
193
-
194
- This is not an official product of Tencent. This repository can only be used for personal/research/non-commercial purposes.
 
1
+ ---
2
+ title: SadTalker
3
+ emoji: 🦀
4
+ colorFrom: purple
5
+ colorTo: green
6
+ sdk: gradio
7
+ sdk_version: 3.15.0
8
+ app_file: app.py
9
+ pinned: false
10
+ license: mit
11
+ ---
12
+
13
+ Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference