ameerazam08 commited on
Commit
593a7ec
β€’
1 Parent(s): 2f6753a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +88 -1
README.md CHANGED
@@ -11,4 +11,91 @@ app_port: 7860
11
  ---
12
 
13
 
14
- ALL Setup for MuseTalk Clone and Run
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
11
  ---
12
 
13
 
14
+ ALL Setup for MuseTalk Clone and Run
15
+
16
+
17
+ ```
18
+ Build environment
19
+ We recommend a python version >=3.10 and cuda version =11.7. Then build environment as follows:
20
+
21
+ pip install -r requirements.txt
22
+ mmlab packages
23
+ pip install --no-cache-dir -U openmim
24
+ mim install mmengine
25
+ mim install "mmcv>=2.0.1"
26
+ mim install "mmdet>=3.1.0"
27
+ mim install "mmpose>=1.1.0"
28
+ Download ffmpeg-static
29
+ Download the ffmpeg-static and
30
+
31
+ export FFMPEG_PATH=/path/to/ffmpeg
32
+ for example:
33
+
34
+ export FFMPEG_PATH=/musetalk/ffmpeg-4.4-amd64-static
35
+ Download weights
36
+ You can download weights manually as follows:
37
+
38
+ Download our trained weights.
39
+
40
+ Download the weights of other components:
41
+
42
+ sd-vae-ft-mse
43
+ whisper
44
+ dwpose
45
+ face-parse-bisent
46
+ resnet18
47
+ Finally, these weights should be organized in models as follows:
48
+
49
+ ./models/
50
+ β”œβ”€β”€ musetalk
51
+ β”‚ └── musetalk.json
52
+ β”‚ └── pytorch_model.bin
53
+ β”œβ”€β”€ dwpose
54
+ β”‚ └── dw-ll_ucoco_384.pth
55
+ β”œβ”€β”€ face-parse-bisent
56
+ β”‚ β”œβ”€β”€ 79999_iter.pth
57
+ β”‚ └── resnet18-5c106cde.pth
58
+ β”œβ”€β”€ sd-vae-ft-mse
59
+ β”‚ β”œβ”€β”€ config.json
60
+ β”‚ └── diffusion_pytorch_model.bin
61
+ └── whisper
62
+ └── tiny.pt
63
+ Quickstart
64
+ Inference
65
+ Here, we provide the inference script.
66
+
67
+ python -m scripts.inference --inference_config configs/inference/test.yaml
68
+ configs/inference/test.yaml is the path to the inference configuration file, including video_path and audio_path. The video_path should be either a video file, an image file or a directory of images.
69
+
70
+ You are recommended to input video with 25fps, the same fps used when training the model. If your video is far less than 25fps, you are recommended to apply frame interpolation or directly convert the video to 25fps using ffmpeg.
71
+
72
+ Use of bbox_shift to have adjustable results
73
+ πŸ”Ž We have found that upper-bound of the mask has an important impact on mouth openness. Thus, to control the mask region, we suggest using the bbox_shift parameter. Positive values (moving towards the lower half) increase mouth openness, while negative values (moving towards the upper half) decrease mouth openness.
74
+
75
+ You can start by running with the default configuration to obtain the adjustable value range, and then re-run the script within this range.
76
+
77
+ For example, in the case of Xinying Sun, after running the default configuration, it shows that the adjustable value rage is [-9, 9]. Then, to decrease the mouth openness, we set the value to be -7.
78
+
79
+ python -m scripts.inference --inference_config configs/inference/test.yaml --bbox_shift -7
80
+ πŸ“Œ More technical details can be found in bbox_shift.
81
+
82
+ Combining MuseV and MuseTalk
83
+ As a complete solution to virtual human generation, you are suggested to first apply MuseV to generate a video (text-to-video, image-to-video or pose-to-video) by referring this. Frame interpolation is suggested to increase frame rate. Then, you can use MuseTalk to generate a lip-sync video by referring this.
84
+
85
+ πŸ†• Real-time inference
86
+ Here, we provide the inference script. This script first applies necessary pre-processing such as face detection, face parsing and VAE encode in advance. During inference, only UNet and the VAE decoder are involved, which makes MuseTalk real-time.
87
+
88
+ python -m scripts.realtime_inference --inference_config configs/inference/realtime.yaml --batch_size 4
89
+ configs/inference/realtime.yaml is the path to the real-time inference configuration file, including preparation, video_path , bbox_shift and audio_clips.
90
+
91
+ Set preparation to True in realtime.yaml to prepare the materials for a new avatar. (If the bbox_shift has changed, you also need to re-prepare the materials.)
92
+ After that, the avatar will use an audio clip selected from audio_clips to generate video.
93
+ Inferring using: data/audio/yongen.wav
94
+ While MuseTalk is inferring, sub-threads can simultaneously stream the results to the users. The generation process can achieve 30fps+ on an NVIDIA Tesla V100.
95
+ Set preparation to False and run this script if you want to genrate more videos using the same avatar.
96
+ Note for Real-time inference
97
+ If you want to generate multiple videos using the same avatar/video, you can also use this script to SIGNIFICANTLY expedite the generation process.
98
+ In the previous script, the generation time is also limited by I/O (e.g. saving images). If you just want to test the generation speed without saving the images, you can run
99
+ python -m scripts.realtime_inference --inference_config configs/inference/realtime.yaml --skip_save_images
100
+
101
+ ```