Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation

🔥🔥🔥 更新!!

Nov 5, 2024: 💬 已经支持图生3D。请在script体验。
Nov 5, 2024: 💬 已经支持文生3D，请在script体验。

📑 开源计划

Inference
Checkpoints
Baking related
Training
ComfyUI
Distillation Version
TensorRT Version

概要

为了解决现有的3D生成模型在生成速度和泛化能力上存在不足，我们开源了混元3D-1.0模型，可以帮助3D创作者和艺术家自动化生产3D资产。我们的模型采用两阶段生成方法，在保证质量和可控的基础上，仅需10秒即可生成3D资产。在第一阶段，我们采用了一种多视角扩散模型，轻量版模型能够在大约4秒内高效生成多视角图像，这些多视角图像从不同的视角捕捉了3D资产的丰富的纹理和几何先验，将任务从单视角重建松弛到多视角重建。在第二阶段，我们引入了一种前馈重建模型，利用上一阶段生成的多视角图像。该模型能够在大约3秒内快速而准确地重建3D资产。重建模型学习处理多视角扩散引入的噪声和不一致性，并利用条件图像中的可用信息高效恢复3D结构。最终，该模型可以实现输入任意单视角实现三维生成。

🎉 Hunyuan3D-1.0 模型架构

📈 比较

通过和其他开源模型比较, 混元3D-1.0在5项指标都得到了最高用户评分。细节请查看以下用户研究结果。

在A100显卡上，轻量版模型仅需10s即可完成单图生成3D，标准版则大约需要25s。以下散点图表明腾讯混元3D-1.0实现了质量和速度的合理平衡。

使用

复制代码仓库

git clone https://github.com/tencent/Hunyuan3D-1
cd Hunyuan3D-1

Linux系统安装

env_install.sh 脚本提供了如何安装环境：

# 第一步：创建环境
conda create -n hunyuan3d-1 python=3.9 or 3.10 or 3.11 or 3.12
conda activate hunyuan3d-1

# 第二部：安装torch和相关依赖包
which pip # check pip corresponds to python

# modify the cuda version according to your machine (recommended)
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121

# 第三步：安装其他相关依赖包
bash env_install.sh

由于dust3r的许可证限制, 我们仅提供其安装途径:

cd third_party
git clone --recursive https://github.com/naver/dust3r.git

cd ../third_party/weights
wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth

💡一些环境安装建议

可以选择安装 xformers 或 flash_attn 进行加速:

pip install xformers --index-url https://download.pytorch.org/whl/cu121

pip install flash_attn

Most environment errors are caused by a mismatch between machine and packages. You can try manually specifying the version, as shown in the following successful cases:

# python3.9
pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118

when install pytorch3d, the gcc version is preferably greater than 9, and the gpu driver should not be too old.

下载预训练模型

模型下载链接 https://huggingface.co/tencent/Hunyuan3D-1:

Hunyuan3D-1/lite, lite model for multi-view generation.
Hunyuan3D-1/std, standard model for multi-view generation.
Hunyuan3D-1/svrm, sparse-view reconstruction model.

为了通过Hugging Face下载模型，请先下载 huggingface-cli. (安装细节可见 here.)

python3 -m pip install "huggingface_hub[cli]"

请使用以下命令下载模型:

mkdir weights
huggingface-cli download tencent/Hunyuan3D-1 --local-dir ./weights

mkdir weights/hunyuanDiT
huggingface-cli download Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled --local-dir ./weights/hunyuanDiT

推理

对于文生3D，我们支持中/英双语生成，请使用以下命令进行本地推理：

python3 main.py \
    --text_prompt "a lovely rabbit" \
    --save_folder ./outputs/test/ \
    --max_faces_num 90000 \
    --do_texture_mapping \
    --do_render

对于图生3D，请使用以下命令进行本地推理：

python3 main.py \
    --image_prompt "/path/to/your/image" \
    --save_folder ./outputs/test/ \
    --max_faces_num 90000 \
    --do_texture_mapping \
    --do_render

更多参数详解：

Argument	Default	Description
`--text_prompt`	None	The text prompt for 3D generation
`--image_prompt`	None	The image prompt for 3D generation
`--t2i_seed`	0	The random seed for generating images
`--t2i_steps`	25	The number of steps for sampling of text to image
`--gen_seed`	0	The random seed for generating 3d generation
`--gen_steps`	50	The number of steps for sampling of 3d generation
`--max_faces_numm`	90000	The limit number of faces of 3d mesh
`--save_memory`	False	module will move to cpu automatically
`--do_texture_mapping`	False	Change vertex shadding to texture shading
`--do_render`	False	render gif

如果显卡内存有限，可以使用--save_memory命令，最低显卡内存要求如下：

Inference Std-pipeline requires 30GB VRAM (24G VRAM with --save_memory).
Inference Lite-pipeline requires 22GB VRAM (18G VRAM with --save_memory).
Note: --save_memory will increase inference time

bash scripts/text_to_3d_std.sh 
bash scripts/text_to_3d_lite.sh 
bash scripts/image_to_3d_std.sh 
bash scripts/image_to_3d_lite.sh

如果你的显卡内存为16G，可以分别加载模型到显卡:

bash scripts/text_to_3d_std_separately.sh 'a lovely rabbit' ./outputs/test # >= 16G
bash scripts/text_to_3d_lite_separately.sh 'a lovely rabbit' ./outputs/test # >= 14G
bash scripts/image_to_3d_std_separately.sh ./demos/example_000.png ./outputs/test  # >= 16G
bash scripts/image_to_3d_lite_separately.sh ./demos/example_000.png ./outputs/test # >= 10G

Gradio界面部署

我们分别提供轻量版和标准版界面：

# std 
python3 app.py
python3 app.py --save_memory

# lite
python3 app.py --use_lite
python3 app.py --use_lite --save_memory

Gradio界面体验地址为 http://0.0.0.0:8080. 这里 0.0.0.0 应当填写运行模型的机器IP地址。

相机参数

生成多视图视角固定为

Azimuth (relative to input view): +0, +60, +120, +180, +240, +300.

引用

如果我们的仓库对您有帮助，请引用我们的工作

@misc{yang2024tencent,
    title={Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
    author={Xianghui Yang and Huiwen Shi and Bowen Zhang and Fan Yang and Jiacheng Wang and Hongxu Zhao and Xinhai Liu and Xinzhou Wang and Qingxiang Lin and Jiaao Yu and Lifu Wang and Zhuo Chen and Sicong Liu and Yuhong Liu and Yong Yang and Di Wang and Jie Jiang and Chunchao Guo},
    year={2024},
    eprint={2411.02293},
    archivePrefix={arXiv},
    primaryClass={cs.CV}
}