Real3DPortrait / README-zh.md
ameerazam08's picture
Upload folder using huggingface_hub
a5c5b03 verified

Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis | ICLR 2024 Spotlight

arXiv| GitHub Stars | English Readme

这个仓库是Real3D-Portrait的官方PyTorch实现, 用于实现单参考图(one-shot)、高视频真实度(video reality)的虚拟人视频合成。您可以访问我们的项目页面以观看Demo视频, 阅读我们的论文以了解技术细节。



快速上手!

安装环境

请参照环境配置文档,配置Conda环境real3dportrait

下载预训练与第三方模型

3DMM BFM模型

下载3DMM BFM模型:Google DriveBaiduYun Disk 提取码: m9q5

下载完成后,放置全部的文件到deep_3drecon/BFM里,文件结构如下:

deep_3drecon/BFM/
├── 01_MorphableModel.mat
├── BFM_exp_idx.mat
├── BFM_front_idx.mat
├── BFM_model_front.mat
├── Exp_Pca.bin
├── facemodel_info.mat
├── index_mp468_from_mesh35709.npy
├── mediapipe_in_bfm53201.npy
└── std_exp.txt

预训练模型

下载预训练的Real3D-Portrait:Google DriveBaiduYun Disk 提取码: 6x4f

下载完成后,放置全部的文件到checkpoints里并解压,文件结构如下:

checkpoints/
├── 240126_real3dportrait_orig
│   ├── audio2secc_vae
│   │   ├── config.yaml
│   │   └── model_ckpt_steps_400000.ckpt
│   └── secc2plane_torso_orig
│       ├── config.yaml
│       └── model_ckpt_steps_100000.ckpt
└── pretrained_ckpts
    └── mit_b0.pth

推理测试

我们目前提供了命令行(CLI)Gradio WebUI推理方式,并将在未来提供Google Colab方式。我们同时支持音频驱动(Audio-Driven)与视频驱动(Video-Driven):

  • 音频驱动场景下,需要至少提供source imagedriving audio
  • 视频驱动场景下,需要至少提供source imagedriving expression video

Gradio WebUI推理

启动Gradio WebUI,按照提示上传素材,点击Generate按钮即可推理:

python inference/app_real3dportrait.py

命令行推理

首先,切换至项目根目录并启用Conda环境:

cd <Real3DPortraitRoot>
conda activate real3dportrait
export PYTHON_PATH=./

音频驱动场景下,需要至少提供source image与driving audio,推理指令:

python inference/real3d_infer.py \
--src_img <PATH_TO_SOURCE_IMAGE> \
--drv_aud <PATH_TO_AUDIO> \
--drv_pose <PATH_TO_POSE_VIDEO, OPTIONAL> \
--bg_img <PATH_TO_BACKGROUND_IMAGE, OPTIONAL> \
--out_name <PATH_TO_OUTPUT_VIDEO, OPTIONAL>

视频驱动场景下,需要至少提供source image与driving expression video(作为drv_aud参数),推理指令:

python inference/real3d_infer.py \
--src_img <PATH_TO_SOURCE_IMAGE> \
--drv_aud <PATH_TO_EXP_VIDEO> \
--drv_pose <PATH_TO_POSE_VIDEO, OPTIONAL> \
--bg_img <PATH_TO_BACKGROUND_IMAGE, OPTIONAL> \
--out_name <PATH_TO_OUTPUT_VIDEO, OPTIONAL>

一些可选参数注释:

  • --drv_pose 指定时提供了运动pose信息,不指定则为静态运动
  • --bg_img 指定时提供了背景信息,不指定则为source image提取的背景
  • --mouth_amp 嘴部张幅参数,值越大张幅越大
  • --map_to_init_pose 值为True时,首帧的pose将被映射到source pose,后续帧也作相同变换
  • --temperature 代表audio2motion的采样温度,值越大结果越多样,但同时精确度越低
  • --out_name 不指定时,结果将保存在infer_out/tmp/
  • --out_mode 值为final时,只输出说话人视频;值为concat_debug时,同时输出一些可视化的中间结果

指令示例:

python inference/real3d_infer.py \
--src_img data/raw/examples/Macron.png \
--drv_aud data/raw/examples/Obama_5s.wav \
--drv_pose data/raw/examples/May_5s.mp4 \
--bg_img data/raw/examples/bg.png \
--out_name output.mp4 \
--out_mode concat_debug

ToDo

  • Release Pre-trained weights of Real3D-Portrait.
  • Release Inference Code of Real3D-Portrait.
  • Release Gradio Demo of Real3D-Portrait..
  • Release Google Colab of Real3D-Portrait..
  • Release Training Code of Real3D-Portrait.

引用我们

如果这个仓库对你有帮助,请考虑引用我们的工作:

@article{ye2024real3d,
  title={Real3D-Portrait: One-shot Realistic 3D Talking Portrait Synthesis},
  author={Ye, Zhenhui and Zhong, Tianyun and Ren, Yi and Yang, Jiaqi and Li, Weichuang and Huang, Jiawei and Jiang, Ziyue and He, Jinzheng and Huang, Rongjie and Liu, Jinglin and others},
  journal={arXiv preprint arXiv:2401.08503},
  year={2024}
}
@article{ye2023geneface++,
  title={GeneFace++: Generalized and Stable Real-Time Audio-Driven 3D Talking Face Generation},
  author={Ye, Zhenhui and He, Jinzheng and Jiang, Ziyue and Huang, Rongjie and Huang, Jiawei and Liu, Jinglin and Ren, Yi and Yin, Xiang and Ma, Zejun and Zhao, Zhou},
  journal={arXiv preprint arXiv:2305.00787},
  year={2023}
}
@article{ye2023geneface,
  title={GeneFace: Generalized and High-Fidelity Audio-Driven 3D Talking Face Synthesis},
  author={Ye, Zhenhui and Jiang, Ziyue and Ren, Yi and Liu, Jinglin and He, Jinzheng and Zhao, Zhou},
  journal={arXiv preprint arXiv:2301.13430},
  year={2023}
}