Spaces:

tencent
/

Hunyuan3D-1

Runtime error

App Files Files Community

Huiwenshi commited on Nov 18, 2024

Commit

68cd723

•

1 Parent(s): 69d0567

Upload folder using huggingface_hub

Browse files

Files changed (22) hide show

.gitignore +42 -0
README.md +43 -12
README_zh_cn.md +242 -0
app.py +231 -110
env_install.sh +1 -1
infer/gif_render.py +3 -3
infer/image_to_views.py +9 -4
infer/text_to_image.py +1 -2
infer/utils.py +7 -1
infer/views_to_mesh.py +7 -4
main.py +60 -12
requirements.txt +1 -0
svrm/ldm/models/svrm.py +16 -19
svrm/ldm/modules/attention.py +20 -11
svrm/ldm/vis_util.py +14 -15
svrm/predictor.py +1 -3
third_party/check.py +25 -0
third_party/dust3r_utils.py +366 -0
third_party/gen_baking.py +288 -0
third_party/mesh_baker.py +142 -0
third_party/utils/camera_utils.py +90 -0
third_party/utils/img_utils.py +211 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,42 @@

+**/*~
+**/*.bk
+**/*.xx
+**/*.so
+**/*.ipynb
+**/*.log
+**/*.swp
+**/*.zip
+**/*.look
+**/*.lock
+**/*.think
+**/dosth.sh
+**/nohup.out
+**/*polaris*
+**/*egg*/
+**/cl5/
+**/tmp/
+**/look/
+**/temp/
+**/build/
+**/model/
+**/log/
+**/backup/
+**/outputs/
+**/work_dir/
+**/work_dirs/
+**/__pycache__/
+**/.ipynb_checkpoints/
+*.jpg
+*.png
+*.gif
+### PreCI ###
+.codecc
+app_hg.py
+outputs
+weights
+.vscode/
+baking
+inference.py
+third_party/weights
+third_party/dust3r

README.md CHANGED Viewed

@@ -1,14 +1,5 @@
----
-title: Hunyuan3D-1.0
-emoji: 😻
-colorFrom: purple
-colorTo: red
-sdk: gradio
-sdk_version: 5.5.0
-app_file: app_hg.py
-pinned: false
-short_description: Text-to-3D and Image-to-3D Generation
----
 <!-- ## **Hunyuan3D-1.0** -->
 <p align="center">
@@ -19,7 +10,7 @@ short_description: Text-to-3D and Image-to-3D Generation
 <div align="center">
   <a href="https://github.com/tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Code&message=Github&color=blue&logo=github-pages"></a> &ensp;
-  <a href="https://3d.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Homepage&message=Tencent Hunyuan3D&color=blue&logo=github-pages"></a> &ensp;
   <a href="https://arxiv.org/pdf/2411.02293"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red&logo=arxiv"></a> &ensp;
   <a href="https://huggingface.co/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Checkpoints&message=HuggingFace&color=yellow"></a> &ensp;
   <a href="https://huggingface.co/spaces/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Demo&message=HuggingFace&color=yellow"></a> &ensp;
@@ -101,6 +92,19 @@ pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
 # step 3. install other packages
 bash env_install.sh
 ```
 <details>
 <summary>💡Other tips for envrionment installation</summary>
@@ -204,6 +208,33 @@ bash scripts/image_to_3d_std_separately.sh ./demos/example_000.png ./outputs/tes
 bash scripts/image_to_3d_lite_separately.sh ./demos/example_000.png ./outputs/test # >= 10G
 ```
 #### Using Gradio
 We have prepared two versions of multi-view generation, std and lite.

+[English](README.md) | [简体中文](README_zh_cn.md)
 <!-- ## **Hunyuan3D-1.0** -->
 <p align="center">
 <div align="center">
   <a href="https://github.com/tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Code&message=Github&color=blue&logo=github-pages"></a> &ensp;
+  <a href="https://3d.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Homepage&message=Tencent%20Hunyuan3D&color=blue&logo=github-pages"></a> &ensp;
   <a href="https://arxiv.org/pdf/2411.02293"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red&logo=arxiv"></a> &ensp;
   <a href="https://huggingface.co/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Checkpoints&message=HuggingFace&color=yellow"></a> &ensp;
   <a href="https://huggingface.co/spaces/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Demo&message=HuggingFace&color=yellow"></a> &ensp;
 # step 3. install other packages
 bash env_install.sh
 ```
+because of dust3r, we offer a guide:
+```
+cd third_party
+git clone --recursive https://github.com/naver/dust3r.git
+cd ../third_party/weights
+wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
+```
 <details>
 <summary>💡Other tips for envrionment installation</summary>
 bash scripts/image_to_3d_lite_separately.sh ./demos/example_000.png ./outputs/test # >= 10G
 ```
+#### Baking related
+```bash
+cd ./third_party
+git clone --recursive https://github.com/naver/dust3r.git
+mkdir -p weights/DUSt3R_ViTLarge_BaseDecoder_512_dpt
+cd weights/DUSt3R_ViTLarge_BaseDecoder_512_dpt
+wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
+cd ../../..
+```
+If you download related code and weights, we list some additional arg:
+|    Argument        |  Default  |                     Description                     |
+|:------------------:|:---------:|:---------------------------------------------------:|
+|`--do_bake`  |   False   | baking multi-view into mesh   |
+|`--bake_align_times`  |   3   | the times of align image with mesh |
+Note: When running main.py, ensure that do_bake is set to True and do_texture_mapping is also set to True.
+```bash
+python main.py ... --do_texture_mapping --do_bake (--do_render)
+```
 #### Using Gradio
 We have prepared two versions of multi-view generation, std and lite.

README_zh_cn.md ADDED Viewed

	@@ -0,0 +1,242 @@

+[English](README.md) | [简体中文](README_zh_cn.md)
+<!-- ## **Hunyuan3D-1.0** -->
+<p align="center">
+  <img src="./assets/logo.png"  height=200>
+</p>
+# Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation
+<div align="center">
+  <a href="https://github.com/tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Code&message=Github&color=blue&logo=github-pages"></a> &ensp;
+  <a href="https://3d.hunyuan.tencent.com"><img src="https://img.shields.io/static/v1?label=Homepage&message=Tencent%20Hunyuan3D&color=blue&logo=github-pages"></a> &ensp;
+  <a href="https://arxiv.org/pdf/2411.02293"><img src="https://img.shields.io/static/v1?label=Tech Report&message=Arxiv&color=red&logo=arxiv"></a> &ensp;
+  <a href="https://huggingface.co/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Checkpoints&message=HuggingFace&color=yellow"></a> &ensp;
+  <a href="https://huggingface.co/spaces/Tencent/Hunyuan3D-1"><img src="https://img.shields.io/static/v1?label=Demo&message=HuggingFace&color=yellow"></a> &ensp;
+</div>
+## 🔥🔥🔥 更新!!
+* Nov 5, 2024: 💬 已经支持图生3D。请在[script](#using-gradio)体验。
+* Nov 5, 2024: 💬 已经支持文生3D，请在[script](#using-gradio)体验。
+## 📑 开源计划
+- [x] Inference
+- [x] Checkpoints
+- [ ] Baking related
+- [ ] Training
+- [ ] ComfyUI
+- [ ] Distillation Version
+- [ ] TensorRT Version
+## **概要**
+<p align="center">
+  <img src="./assets/teaser.png"  height=450>
+</p>
+为了解决现有的3D生成模型在生成速度和泛化能力上存在不足，我们开源了混元3D-1.0模型，可以帮助3D创作者和艺术家自动化生产3D资产。我们的模型采用两阶段生成方法，在保证质量和可控的基础上，仅需10秒即可生成3D资产。在第一阶段，我们采用了一种多视角扩散模型，轻量版模型能够在大约4秒内高效生成多视角图像，这些多视角图像从不同的视角捕捉了3D资产的丰富的纹理和几何先验，将任务从单视角重建松弛到多视角重建。在第二阶段，我们引入了一种前馈重建模型，利用上一阶段生成的多视角图像。该模型能够在大约3秒内快速而准确地重建3D资产。重建模型学习处理多视角扩散引入的噪声和不一致性，并利用条件图像中的可用信息高效恢复3D结构。最终，该模型可以实现输入任意单视角实现三维生成。
+## 🎉 **Hunyuan3D-1.0 模型架构**
+<p align="center">
+  <img src="./assets/overview_3.png"  height=400>
+</p>
+## 📈 比较
+通过和其他开源模型比较, 混元3D-1.0在5项指标都得到了最高用户评分。细节请查看以下用户研究结果。
+在A100显卡上，轻量版模型仅需10s即可完成单图生成3D，标准版则大约需要25s。以下散点图表明腾讯混元3D-1.0实现了质量和速度的合理平衡。
+<p align="center">
+  <img src="./assets/radar.png"  height=300>
+  <img src="./assets/runtime.png"  height=300>
+</p>
+## 使用
+#### 复制代码仓库
+```shell
+git clone https://github.com/tencent/Hunyuan3D-1
+cd Hunyuan3D-1
+```
+#### Linux系统安装
+env_install.sh 脚本提供了如何安装环境：
+```
+# 第一步：创建环境
+conda create -n hunyuan3d-1 python=3.9 or 3.10 or 3.11 or 3.12
+conda activate hunyuan3d-1
+# 第二部：安装torch和相关依赖包
+which pip # check pip corresponds to python
+# modify the cuda version according to your machine (recommended)
+pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
+# 第三步：安装其他相关依赖包
+bash env_install.sh
+```
+由于dust3r的许可证限制, 我们仅提供其安装途径:
+```
+cd third_party
+git clone --recursive https://github.com/naver/dust3r.git
+cd ../third_party/weights
+wget https://download.europe.naverlabs.com/ComputerVision/DUSt3R/DUSt3R_ViTLarge_BaseDecoder_512_dpt.pth
+```
+<details>
+<summary>💡一些环境安装建议</summary>
+可以选择安装 xformers 或 flash_attn 进行加速:
+```
+pip install xformers --index-url https://download.pytorch.org/whl/cu121
+```
+```
+pip install flash_attn
+```
+Most environment errors are caused by a mismatch between machine and packages. You can try manually specifying the version, as shown in the following successful cases:
+```
+# python3.9
+pip install torch==2.0.1 torchvision==0.15.2 --index-url https://download.pytorch.org/whl/cu118
+```
+when install pytorch3d, the gcc version is preferably greater than 9, and the gpu driver should not be too old.
+</details>
+#### 下载预训练模型
+模型下载链接 [https://huggingface.co/tencent/Hunyuan3D-1](https://huggingface.co/tencent/Hunyuan3D-1):
++ `Hunyuan3D-1/lite`, lite model for multi-view generation.
++ `Hunyuan3D-1/std`, standard model for multi-view generation.
++ `Hunyuan3D-1/svrm`, sparse-view reconstruction model.
+为了通过Hugging Face下载模型，请先下载 huggingface-cli. (安装细节可见 [here](https://huggingface.co/docs/huggingface_hub/guides/cli).)
+```shell
+python3 -m pip install "huggingface_hub[cli]"
+```
+请使用以下命令下载模型:
+```shell
+mkdir weights
+huggingface-cli download tencent/Hunyuan3D-1 --local-dir ./weights
+mkdir weights/hunyuanDiT
+huggingface-cli download Tencent-Hunyuan/HunyuanDiT-v1.1-Diffusers-Distilled --local-dir ./weights/hunyuanDiT
+```
+#### 推理
+对于文生3D，我们支持中/英双语生成，请使用以下命令进行本地推理：
+```python
+python3 main.py \
+    --text_prompt "a lovely rabbit" \
+    --save_folder ./outputs/test/ \
+    --max_faces_num 90000 \
+    --do_texture_mapping \
+    --do_render
+```
+对于图生3D，请使用以下命令进行本地推理：
+```python
+python3 main.py \
+    --image_prompt "/path/to/your/image" \
+    --save_folder ./outputs/test/ \
+    --max_faces_num 90000 \
+    --do_texture_mapping \
+    --do_render
+```
+更多参数详解：
+|    Argument        |  Default  |                     Description                     |
+|:------------------:|:---------:|:---------------------------------------------------:|
+|`--text_prompt`  |   None    |The text prompt for 3D generation         |
+|`--image_prompt` |   None    |The image prompt for 3D generation         |
+|`--t2i_seed`     |    0      |The random seed for generating images        |
+|`--t2i_steps`    |    25     |The number of steps for sampling of text to image  |
+|`--gen_seed`     |    0      |The random seed for generating 3d generation        |
+|`--gen_steps`    |    50     |The number of steps for sampling of 3d generation  |
+|`--max_faces_numm` | 90000  |The limit number of faces of 3d mesh |
+|`--save_memory`   | False   |module will move to cpu automatically|
+|`--do_texture_mapping` |   False    |Change vertex shadding to texture shading  |
+|`--do_render`  |   False   |render gif   |
+如果显卡内存有限，可以使用`--save_memory`命令，最低显卡内存要求如下：
+- Inference Std-pipeline requires 30GB VRAM (24G VRAM with --save_memory).
+- Inference Lite-pipeline requires 22GB VRAM (18G VRAM with --save_memory).
+- Note: --save_memory will increase inference time
+```bash
+bash scripts/text_to_3d_std.sh
+bash scripts/text_to_3d_lite.sh
+bash scripts/image_to_3d_std.sh
+bash scripts/image_to_3d_lite.sh
+```
+如果你的显卡内存为16G，可以分别加载模型到显卡:
+```bash
+bash scripts/text_to_3d_std_separately.sh 'a lovely rabbit' ./outputs/test # >= 16G
+bash scripts/text_to_3d_lite_separately.sh 'a lovely rabbit' ./outputs/test # >= 14G
+bash scripts/image_to_3d_std_separately.sh ./demos/example_000.png ./outputs/test  # >= 16G
+bash scripts/image_to_3d_lite_separately.sh ./demos/example_000.png ./outputs/test # >= 10G
+```
+#### Gradio界面部署
+我们分别提供轻量版和标准版界面：
+```shell
+# std
+python3 app.py
+python3 app.py --save_memory
+# lite
+python3 app.py --use_lite
+python3 app.py --use_lite --save_memory
+```
+Gradio界面体验地址为 http://0.0.0.0:8080. 这里 0.0.0.0 应当填写运行模型的机器IP地址。
+## 相机参数
+生成多视图视角固定为
++ Azimuth (relative to input view): `+0, +60, +120, +180, +240, +300`.
+## 引用
+如果我们的仓库对您有帮助，请引用我们的工作
+```bibtex
+@misc{yang2024tencent,
+    title={Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
+    author={Xianghui Yang and Huiwen Shi and Bowen Zhang and Fan Yang and Jiacheng Wang and Hongxu Zhao and Xinhai Liu and Xinzhou Wang and Qingxiang Lin and Jiaao Yu and Lifu Wang and Zhuo Chen and Sicong Liu and Yuhong Liu and Yong Yang and Di Wang and Jie Jiang and Chunchao Guo},
+    year={2024},
+    eprint={2411.02293},
+    archivePrefix={arXiv},
+    primaryClass={cs.CV}
+}
+```

app.py CHANGED Viewed

@@ -32,9 +32,21 @@ import torch
 import numpy as np
 from PIL import Image
 from einops import rearrange
 from infer import seed_everything, save_gif
 from infer import Text2Image, Removebg, Image2Views, Views2Mesh, GifRenderer
 warnings.simplefilter('ignore', category=UserWarning)
 warnings.simplefilter('ignore', category=FutureWarning)
@@ -58,33 +70,19 @@ CONST_MAX_QUEUE = 1
 CONST_SERVER = '0.0.0.0'
 CONST_HEADER = '''
-<h2><b>Official 🤗 Gradio Demo</b></h2><h2><a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'><b>Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D
-Generationr</b></a></h2>
-Code: <a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'>GitHub</a>. Techenical report: <a href='https://arxiv.org/abs/placeholder' target='_blank'>ArXiv</a>.
-❗️❗️❗️**Important Notes:**
-- By default, our demo can export a .obj mesh with vertex colors or a .glb mesh.
-- If you select "texture mapping," it will export a .obj mesh with a texture map or a .glb mesh.
-- If you select "render GIF," it will export a GIF image rendering of the .glb file.
-- If the result is unsatisfactory, please try a different seed value (Default: 0).
 '''
-CONST_CITATION = r"""
-If HunYuan3D-1 is helpful, please help to ⭐ the <a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'>Github Repo</a>. Thanks! [![GitHub Stars](https://img.shields.io/github/stars/tencent/Hunyuan3D-1?style=social)](https://github.com/tencent/Hunyuan3D-1)
----
-📝 **Citation**
-If you find our work useful for your research or applications, please cite using this bibtex:
-```bibtex
-@misc{yang2024tencent,
-    title={Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation},
-    author={Xianghui Yang and Huiwen Shi and Bowen Zhang and Fan Yang and Jiacheng Wang and Hongxu Zhao and Xinhai Liu and Xinzhou Wang and Qingxiang Lin and Jiaao Yu and Lifu Wang and Zhuo Chen and Sicong Liu and Yuhong Liu and Yong Yang and Di Wang and Jie Jiang and Chunchao Guo},
-    year={2024},
-    eprint={2411.02293},
-    archivePrefix={arXiv},
-    primaryClass={cs.CV}
-}
-```
-"""
 ################################################################
 # prepare text examples and image examples
@@ -129,6 +127,13 @@ worker_v23 = Views2Mesh(
 )
 worker_gif = GifRenderer(args.device)
 def stage_0_t2i(text, image, seed, step):
     os.makedirs('./outputs/app_output', exist_ok=True)
     exists = set(int(_) for _ in os.listdir('./outputs/app_output') if not _.startswith("."))
@@ -153,11 +158,11 @@ def stage_0_t2i(text, image, seed, step):
     dst = worker_xbg(image, save_folder)
     return dst, save_folder
-def stage_1_xbg(image, save_folder):
     if isinstance(image, str):
         image = Image.open(image)
     dst =  save_folder + '/img_nobg.png'
-    rgba = worker_xbg(image)
     rgba.save(dst)
     return dst
@@ -181,12 +186,9 @@ def stage_3_v23(
     seed,
     save_folder,
     target_face_count = 30000,
-    do_texture_mapping = True,
-    do_render =True
 ):
-    do_texture_mapping = do_texture_mapping or do_render
-    obj_dst = save_folder + '/mesh_with_colors.obj'
-    glb_dst = save_folder + '/mesh.glb'
     worker_v23(
         views_pil,
         cond_pil,
@@ -195,149 +197,268 @@ def stage_3_v23(
         target_face_count = target_face_count,
         do_texture_mapping = do_texture_mapping
     )
     return obj_dst, glb_dst
-def stage_4_gif(obj_dst, save_folder, do_render_gif=True):
-    if not do_render_gif: return None
-    gif_dst = save_folder + '/output.gif'
-    worker_gif(
-        save_folder + '/mesh.obj',
-        gif_dst_path = gif_dst
-    )
     return gif_dst
 # ===============================================================
 # gradio display
 # ===============================================================
 with gr.Blocks() as demo:
     gr.Markdown(CONST_HEADER)
     with gr.Row(variant="panel"):
         with gr.Column(scale=2):
             with gr.Tab("Text to 3D"):
                 with gr.Column():
-                    text = gr.TextArea('一只黑白相间的熊猫在白色背景上居中坐着，呈现出卡通风格和可爱氛围。', lines=1, max_lines=10, label='Input text')
                     with gr.Row():
-                        textgen_seed = gr.Number(value=0, label="T2I seed", precision=0)
-                        textgen_step = gr.Number(value=25, label="T2I step", precision=0)
-                        textgen_SEED = gr.Number(value=0, label="Gen seed", precision=0)
-                        textgen_STEP = gr.Number(value=50, label="Gen step", precision=0)
-                        textgen_max_faces = gr.Number(value=90000, label="max number of faces", precision=0)
                     with gr.Row():
-                        textgen_do_texture_mapping = gr.Checkbox(label="texture mapping", value=False, interactive=True)
-                        textgen_do_render_gif = gr.Checkbox(label="Render gif", value=False, interactive=True)
                         textgen_submit = gr.Button("Generate", variant="primary")
                     with gr.Row():
-                        gr.Examples(examples=example_ts, inputs=[text], label="Txt examples", examples_per_page=10)
             with gr.Tab("Image to 3D"):
-                with gr.Column():
-                    input_image = gr.Image(label="Input image",
-                                           width=256, height=256, type="pil",
-                                           image_mode="RGBA", sources="upload",
-                                           interactive=True)
-                    with gr.Row():
-                        imggen_SEED = gr.Number(value=0, label="Gen seed", precision=0)
-                        imggen_STEP = gr.Number(value=50, label="Gen step", precision=0)
-                        imggen_max_faces = gr.Number(value=90000, label="max number of faces", precision=0)
-                    with gr.Row():
-                        imggen_do_texture_mapping = gr.Checkbox(label="texture mapping", value=False, interactive=True)
-                        imggen_do_render_gif = gr.Checkbox(label="Render gif", value=False, interactive=True)
-                        imggen_submit = gr.Button("Generate", variant="primary")
-                    with gr.Row():
-                        gr.Examples(
-                            examples=example_is,
-                            inputs=[input_image],
-                            label="Img examples",
-                            examples_per_page=10
-                        )
         with gr.Column(scale=3):
             with gr.Row():
                 with gr.Column(scale=2):
-                    rem_bg_image = gr.Image(label="No backgraound image", type="pil",
-                                           image_mode="RGBA", interactive=False)
                 with gr.Column(scale=3):
-                    result_image = gr.Image(label="Multi views", type="pil", interactive=False)
-            with gr.Row():
                 result_3dobj = gr.Model3D(
                     clear_color=[0.0, 0.0, 0.0, 0.0],
-                    label="Output Obj",
                     show_label=True,
                     visible=True,
                     camera_position=[90, 90, None],
                     interactive=False
                 )
-                result_3dglb = gr.Model3D(
                     clear_color=[0.0, 0.0, 0.0, 0.0],
-                    label="Output Glb",
                     show_label=True,
                     visible=True,
                     camera_position=[90, 90, None],
-                    interactive=False
-                )
-                result_gif = gr.Image(label="Rendered GIF", interactive=False)
-            with gr.Row():
-                gr.Markdown("""
-                We recommend downloading and opening Glb with 3D software, such as Blender, MeshLab, etc.
-                Limited by gradio, Obj file here only be shown as vertex shading, but Glb can be texture shading.
-                """)
-#===============================================================
-# gradio running code
-#===============================================================
     none = gr.State(None)
     save_folder = gr.State()
     cond_image = gr.State()
     views_image = gr.State()
     text_image = gr.State()
     textgen_submit.click(
-        fn=stage_0_t2i, inputs=[text, none, textgen_seed, textgen_step],
         outputs=[rem_bg_image, save_folder],
     ).success(
-        fn=stage_2_i2v, inputs=[rem_bg_image, textgen_SEED, textgen_STEP, save_folder],
         outputs=[views_image, cond_image, result_image],
     ).success(
-        fn=stage_3_v23, inputs=[views_image, cond_image, textgen_SEED, save_folder,
-                                textgen_max_faces, textgen_do_texture_mapping,
-                                textgen_do_render_gif],
-        outputs=[result_3dobj, result_3dglb],
     ).success(
-        fn=stage_4_gif, inputs=[result_3dglb, save_folder, textgen_do_render_gif],
         outputs=[result_gif],
     ).success(lambda: print('Text_to_3D Done ...'))
     imggen_submit.click(
-        fn=stage_0_t2i, inputs=[none, input_image, textgen_seed, textgen_step],
         outputs=[text_image, save_folder],
     ).success(
-        fn=stage_1_xbg, inputs=[text_image, save_folder],
         outputs=[rem_bg_image],
     ).success(
-        fn=stage_2_i2v, inputs=[rem_bg_image, imggen_SEED, imggen_STEP, save_folder],
         outputs=[views_image, cond_image, result_image],
     ).success(
-        fn=stage_3_v23, inputs=[views_image, cond_image, imggen_SEED, save_folder,
-                                imggen_max_faces, imggen_do_texture_mapping,
-                                imggen_do_render_gif],
-        outputs=[result_3dobj, result_3dglb],
     ).success(
-        fn=stage_4_gif, inputs=[result_3dglb, save_folder, imggen_do_render_gif],
         outputs=[result_gif],
     ).success(lambda: print('Image_to_3D Done ...'))
-#===============================================================
-# start gradio server
-#===============================================================
-    gr.Markdown(CONST_CITATION)
     demo.queue(max_size=CONST_MAX_QUEUE)
     demo.launch(server_name=CONST_SERVER, server_port=CONST_PORT)

 import numpy as np
 from PIL import Image
 from einops import rearrange
+import pandas as pd
 from infer import seed_everything, save_gif
 from infer import Text2Image, Removebg, Image2Views, Views2Mesh, GifRenderer
+from third_party.check import check_bake_available
+try:
+    from third_party.mesh_baker import MeshBaker
+    BAKE_AVAILEBLE = True
+except Exception as err:
+    print(err)
+    print("import baking related fail, run without baking")
+    check_bake_available()
+    BAKE_AVAILEBLE = False
 warnings.simplefilter('ignore', category=UserWarning)
 warnings.simplefilter('ignore', category=FutureWarning)
 CONST_SERVER = '0.0.0.0'
 CONST_HEADER = '''
+<h2><a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'><b>Tencent Hunyuan3D-1.0: A Unified Framework for Text-to-3D and Image-to-3D Generation</b></a></h2>
+⭐️Technical report: <a href='https://arxiv.org/pdf/2411.02293' target='_blank'>ArXiv</a>. ⭐️Code: <a href='https://github.com/tencent/Hunyuan3D-1' target='_blank'>GitHub</a>.
 '''
+CONST_NOTE = '''
+❗️❗️❗️Usage❗️❗️❗️<br>
+Limited by format, the model can only export *.obj mesh with vertex colors. The "texture" mod can only work on *.glb.<br>
+Please click "Do Rendering" to export a GIF.<br>
+You can click "Do Baking" to bake multi-view imgaes onto the shape.<br>
+If the results aren't satisfactory, please try a different radnom seed (default is 0).
+'''
 ################################################################
 # prepare text examples and image examples
 )
 worker_gif = GifRenderer(args.device)
+if BAKE_AVAILEBLE:
+    worker_baker = MeshBaker()
+### functional modules
 def stage_0_t2i(text, image, seed, step):
     os.makedirs('./outputs/app_output', exist_ok=True)
     exists = set(int(_) for _ in os.listdir('./outputs/app_output') if not _.startswith("."))
     dst = worker_xbg(image, save_folder)
     return dst, save_folder
+def stage_1_xbg(image, save_folder, force_remove):
     if isinstance(image, str):
         image = Image.open(image)
     dst =  save_folder + '/img_nobg.png'
+    rgba = worker_xbg(image, force=force_remove)
     rgba.save(dst)
     return dst
     seed,
     save_folder,
     target_face_count = 30000,
+    texture_color = 'texture'
 ):
+    do_texture_mapping = texture_color == 'texture'
     worker_v23(
         views_pil,
         cond_pil,
         target_face_count = target_face_count,
         do_texture_mapping = do_texture_mapping
     )
+    glb_dst = save_folder + '/mesh.glb' if do_texture_mapping else None
+    obj_dst =  save_folder + '/mesh.obj'
+    obj_dst = save_folder + '/mesh_vertex_colors.obj' # gradio just only can show vertex shading
     return obj_dst, glb_dst
+def stage_3p_baking(save_folder, color, bake):
+    if color == "texture" and bake:
+        obj_dst = worker_baker(save_folder)
+        glb_dst = obj_dst.replace(".obj", ".glb")
+        return glb_dst
+    else:
+        return None
+def stage_4_gif(save_folder, color, bake, render):
+    if not render: return None
+    if os.path.exists(save_folder + '/view_1/bake/mesh.obj'):
+        obj_dst = save_folder + '/view_1/bake/mesh.obj'
+    elif os.path.exists(save_folder + '/view_0/bake/mesh.obj'):
+        obj_dst = save_folder + '/view_0/bake/mesh.obj'
+    elif os.path.exists(save_folder + '/mesh.obj'):
+        obj_dst = save_folder + '/mesh.obj'
+    else:
+        print(save_folder)
+        raise FileNotFoundError("mesh obj file not found")
+    gif_dst = obj_dst.replace(".obj", ".gif")
+    worker_gif(obj_dst, gif_dst_path=gif_dst)
     return gif_dst
+def check_image_available(image):
+    if image.mode == "RGBA":
+        data = np.array(image)
+        alpha_channel = data[:, :, 3]
+        unique_alpha_values = np.unique(alpha_channel)
+        if len(unique_alpha_values) == 1:
+            msg = "The alpha channel is missing or invalid. The background removal option is selected for you."
+            return msg, gr.update(value=True, interactive=False)
+        else:
+            msg = "The image has four channels, and you can choose to remove the background or not."
+            return msg, gr.update(value=False, interactive=True)
+    elif image.mode == "RGB":
+        msg = "The alpha channel is missing or invalid. The background removal option is selected for you."
+        return msg, gr.update(value=True, interactive=False)
+    else:
+        raise Exception("Image Error")
+def update_bake_render(color):
+    if color == "vertex":
+        return gr.update(value=False, interactive=False), gr.update(value=False, interactive=False)
+    else:
+        return gr.update(interactive=True), gr.update(interactive=True)
 # ===============================================================
 # gradio display
 # ===============================================================
 with gr.Blocks() as demo:
     gr.Markdown(CONST_HEADER)
     with gr.Row(variant="panel"):
+        ###### Input region
         with gr.Column(scale=2):
+            ### Text iutput region
             with gr.Tab("Text to 3D"):
                 with gr.Column():
+                    text = gr.TextArea('一只黑白相间的熊猫在白色背景上居中坐着，呈现出卡通风格和可爱氛围。',
+                                       lines=3, max_lines=20, label='Input text')
                     with gr.Row():
+                        textgen_color = gr.Radio(choices=["vertex", "texture"], label="Color", value="texture")
+                    with gr.Row():
+                        textgen_render = gr.Checkbox(label="Do Rendering", value=True, interactive=True)
+                        if BAKE_AVAILEBLE:
+                            textgen_bake = gr.Checkbox(label="Do Baking", value=True, interactive=True)
+                        else:
+                            textgen_bake = gr.Checkbox(label="Do Baking", value=False, interactive=False)
+                    textgen_color.change(
+                        fn=update_bake_render,
+                        inputs=textgen_color,
+                        outputs=[textgen_bake, textgen_render]
+                    )
+                    with gr.Row():
+                        textgen_seed = gr.Number(value=0, label="T2I seed", precision=0, interactive=True)
+                        textgen_step = gr.Number(value=25, label="T2I steps", precision=0,
+                                                 minimum=10, maximum=50, interactive=True)
+                        textgen_SEED = gr.Number(value=0, label="Gen seed", precision=0, interactive=True)
+                        textgen_STEP = gr.Number(value=50, label="Gen steps", precision=0,
+                                                 minimum=40, maximum=100, interactive=True)
+                        textgen_max_faces = gr.Number(value=90000, label="Face number", precision=0,
+                                                      minimum=5000, maximum=1000000, interactive=True)
                     with gr.Row():
                         textgen_submit = gr.Button("Generate", variant="primary")
                     with gr.Row():
+                        gr.Examples(examples=example_ts, inputs=[text], label="Text examples", examples_per_page=10)
+            ### Image iutput region
             with gr.Tab("Image to 3D"):
+                with gr.Row():
+                    input_image = gr.Image(label="Input image", width=256, height=256, type="pil",
+                                           image_mode="RGBA", sources="upload", interactive=True)
+                with gr.Row():
+                    alert_message = gr.Markdown("")  # for warning
+                with gr.Row():
+                    imggen_color = gr.Radio(choices=["vertex", "texture"], label="Color", value="texture")
+                with gr.Row():
+                    imggen_removebg = gr.Checkbox(label="Remove Background", value=True, interactive=True)
+                    imggen_render = gr.Checkbox(label="Do Rendering", value=True, interactive=True)
+                    if BAKE_AVAILEBLE:
+                        imggen_bake = gr.Checkbox(label="Do Baking", value=True, interactive=True)
+                    else:
+                        imggen_bake = gr.Checkbox(label="Do Baking", value=False, interactive=False)
+                input_image.change(
+                    fn=check_image_available,
+                    inputs=input_image,
+                    outputs=[alert_message, imggen_removebg]
+                )
+                imggen_color.change(
+                    fn=update_bake_render,
+                    inputs=imggen_color,
+                    outputs=[imggen_bake, imggen_render]
+                )
+                with gr.Row():
+                    imggen_SEED = gr.Number(value=0, label="Gen seed", precision=0, interactive=True)
+                    imggen_STEP = gr.Number(value=50, label="Gen steps", precision=0,
+                                            minimum=40, maximum=100, interactive=True)
+                    imggen_max_faces = gr.Number(value=90000, label="Face number", precision=0,
+                                                     minimum=5000, maximum=1000000, interactive=True)
+                with gr.Row():
+                    imggen_submit = gr.Button("Generate", variant="primary")
+                with gr.Row():
+                    gr.Examples(examples=example_is,  inputs=[input_image],
+                        label="Img examples", examples_per_page=10)
+            gr.Markdown(CONST_NOTE)
+        ###### Output region
         with gr.Column(scale=3):
             with gr.Row():
                 with gr.Column(scale=2):
+                    rem_bg_image = gr.Image(
+                        label="Image without background",
+                        type="pil",
+                        image_mode="RGBA",
+                        interactive=False
+                    )
                 with gr.Column(scale=3):
+                    result_image = gr.Image(
+                        label="Multi-view images",
+                        type="pil",
+                        interactive=False
+                    )
+            with gr.Row():
                 result_3dobj = gr.Model3D(
                     clear_color=[0.0, 0.0, 0.0, 0.0],
+                    label="OBJ vertex color",
                     show_label=True,
                     visible=True,
                     camera_position=[90, 90, None],
                     interactive=False
                 )
+                result_gif = gr.Image(label="GIF", interactive=False)
+            with gr.Row():
+                result_3dglb_texture = gr.Model3D(
+                    clear_color=[0.0, 0.0, 0.0, 0.0],
+                    label="GLB texture color",
+                    show_label=True,
+                    visible=True,
+                    camera_position=[90, 90, None],
+                    interactive=False)
+                result_3dglb_baked = gr.Model3D(
                     clear_color=[0.0, 0.0, 0.0, 0.0],
+                    label="GLB baked color",
                     show_label=True,
                     visible=True,
                     camera_position=[90, 90, None],
+                    interactive=False)
+            with gr.Row():
+                gr.Markdown(
+                    "Due to Gradio limitations, OBJ files are displayed with vertex shading only, "
+                    "while GLB files can be viewed with texture shading. <br>For the best experience, "
+                    "we recommend downloading the GLB files and opening them with 3D software "
+                    "like Blender or MeshLab."
+                )
+    #===============================================================
+    # gradio running code
+    #===============================================================
     none = gr.State(None)
     save_folder = gr.State()
     cond_image = gr.State()
     views_image = gr.State()
     text_image = gr.State()
     textgen_submit.click(
+        fn=stage_0_t2i,
+        inputs=[text, none, textgen_seed, textgen_step],
         outputs=[rem_bg_image, save_folder],
     ).success(
+        fn=stage_2_i2v,
+        inputs=[rem_bg_image, textgen_SEED, textgen_STEP, save_folder],
         outputs=[views_image, cond_image, result_image],
     ).success(
+        fn=stage_3_v23,
+        inputs=[views_image, cond_image, textgen_SEED, save_folder, textgen_max_faces, textgen_color],
+        outputs=[result_3dobj, result_3dglb_texture],
     ).success(
+        fn=stage_3p_baking,
+        inputs=[save_folder, textgen_color, textgen_bake],
+        outputs=[result_3dglb_baked],
+    ).success(
+        fn=stage_4_gif,
+        inputs=[save_folder, textgen_color, textgen_bake, textgen_render],
         outputs=[result_gif],
     ).success(lambda: print('Text_to_3D Done ...'))
     imggen_submit.click(
+        fn=stage_0_t2i,
+        inputs=[none, input_image, textgen_seed, textgen_step],
         outputs=[text_image, save_folder],
     ).success(
+        fn=stage_1_xbg,
+        inputs=[text_image, save_folder, imggen_removebg],
         outputs=[rem_bg_image],
     ).success(
+        fn=stage_2_i2v,
+        inputs=[rem_bg_image, imggen_SEED, imggen_STEP, save_folder],
         outputs=[views_image, cond_image, result_image],
     ).success(
+        fn=stage_3_v23,
+        inputs=[views_image, cond_image, imggen_SEED, save_folder, imggen_max_faces, imggen_color],
+        outputs=[result_3dobj, result_3dglb_texture],
+    ).success(
+        fn=stage_3p_baking,
+        inputs=[save_folder, imggen_color, imggen_bake],
+        outputs=[result_3dglb_baked],
     ).success(
+        fn=stage_4_gif,
+        inputs=[save_folder, imggen_color, imggen_bake, imggen_render],
         outputs=[result_gif],
     ).success(lambda: print('Image_to_3D Done ...'))
+    #===============================================================
+    # start gradio server
+    #===============================================================
     demo.queue(max_size=CONST_MAX_QUEUE)
     demo.launch(server_name=CONST_SERVER, server_port=CONST_PORT)

env_install.sh CHANGED Viewed

@@ -1,6 +1,6 @@
 pip3 install diffusers transformers
 pip3 install rembg tqdm omegaconf matplotlib opencv-python imageio jaxtyping einops
-pip3 install SentencePiece accelerate trimesh PyMCubes xatlas libigl ninja gradio
 pip3 install git+https://github.com/facebookresearch/pytorch3d@stable
 pip3 install git+https://github.com/NVlabs/nvdiffrast
 pip3 install open3d

 pip3 install diffusers transformers
 pip3 install rembg tqdm omegaconf matplotlib opencv-python imageio jaxtyping einops
+pip3 install SentencePiece accelerate trimesh PyMCubes xatlas libigl ninja gradio roma
 pip3 install git+https://github.com/facebookresearch/pytorch3d@stable
 pip3 install git+https://github.com/NVlabs/nvdiffrast
 pip3 install open3d

infer/gif_render.py CHANGED Viewed

@@ -25,7 +25,7 @@
 import os, sys
 sys.path.insert(0, f"{os.path.dirname(os.path.dirname(os.path.abspath(__file__)))}")
-from svrm.ldm.vis_util import render
 from infer.utils import seed_everything, timing_decorator
 class GifRenderer():
@@ -40,14 +40,14 @@ class GifRenderer():
         self,
         obj_filename,
         elev=0,
-        azim=0,
         resolution=512,
         gif_dst_path='',
         n_views=120,
         fps=30,
         rgb=True
     ):
-        render(
             obj_filename,
             elev=elev,
             azim=azim,

 import os, sys
 sys.path.insert(0, f"{os.path.dirname(os.path.dirname(os.path.abspath(__file__)))}")
+from svrm.ldm.vis_util import render_func
 from infer.utils import seed_everything, timing_decorator
 class GifRenderer():
         self,
         obj_filename,
         elev=0,
+        azim=None,
         resolution=512,
         gif_dst_path='',
         n_views=120,
         fps=30,
         rgb=True
     ):
+        render_func(
             obj_filename,
             elev=elev,
             azim=azim,

infer/image_to_views.py CHANGED Viewed

@@ -48,21 +48,26 @@ def save_gif(pils, save_path, df=False):
 class Image2Views():
-    def __init__(self, device="cuda:0", use_lite=False, save_memory=False):
         self.device = device
         if use_lite:
             self.pipe = Hunyuan3d_MVD_Lite_Pipeline.from_pretrained(
-                "./weights/mvd_lite",
                 torch_dtype = torch.float16,
                 use_safetensors = True,
             )
         else:
             self.pipe = HunYuan3D_MVD_Std_Pipeline.from_pretrained(
-                "./weights/mvd_std",
                 torch_dtype = torch.float16,
                 use_safetensors = True,
             )
-        self.pipe = self.pipe.to(device)
         self.order = [0, 1, 2, 3, 4, 5] if use_lite else [0, 2, 4, 5, 3, 1]
         self.save_memory = save_memory
         set_parameter_grad_false(self.pipe.unet)

 class Image2Views():
+    def __init__(self,
+            device="cuda:0", use_lite=False, save_memory=False,
+            std_pretrain='./weights/mvd_std', lite_pretrain='./weights/mvd_lite'
+        ):
         self.device = device
         if use_lite:
+            print("loading", lite_pretrain)
             self.pipe = Hunyuan3d_MVD_Lite_Pipeline.from_pretrained(
+                lite_pretrain,
                 torch_dtype = torch.float16,
                 use_safetensors = True,
             )
         else:
+            print("loadding", std_pretrain)
             self.pipe = HunYuan3D_MVD_Std_Pipeline.from_pretrained(
+                std_pretrain,
                 torch_dtype = torch.float16,
                 use_safetensors = True,
             )
+        self.pipe = self.pipe if save_memory else self.pipe.to(device)
         self.order = [0, 1, 2, 3, 4, 5] if use_lite else [0, 2, 4, 5, 3, 1]
         self.save_memory = save_memory
         set_parameter_grad_false(self.pipe.unet)

infer/text_to_image.py CHANGED Viewed

@@ -46,8 +46,7 @@ class Text2Image():
         )
         set_parameter_grad_false(self.pipe.transformer)
         print('text2image transformer model', get_parameter_number(self.pipe.transformer))
-        if not save_memory:
-            self.pipe = self.pipe.to(device)
         self.neg_txt = "文本,特写,裁剪,出框,最差质量,低质量,JPEG伪影,PGLY,重复,病态,残缺,多余的手指,变异的手," \
                        "画得不好的手,画得不好的脸,变异,畸形,模糊,脱水,糟糕的解剖学,糟糕的比例,多余的肢体,克隆的脸," \
                        "毁容,恶心的比例,畸形的肢体,缺失的手臂,缺失的腿,额外的手臂,额外的腿,融合的手指,手指太多,长脖子"

         )
         set_parameter_grad_false(self.pipe.transformer)
         print('text2image transformer model', get_parameter_number(self.pipe.transformer))
+        self.pipe = self.pipe if save_memory else self.pipe.to(device)
         self.neg_txt = "文本,特写,裁剪,出框,最差质量,低质量,JPEG伪影,PGLY,重复,病态,残缺,多余的手指,变异的手," \
                        "画得不好的手,画得不好的脸,变异,畸形,模糊,脱水,糟糕的解剖学,糟糕的比例,多余的肢体,克隆的脸," \
                        "毁容,恶心的比例,畸形的肢体,缺失的手臂,缺失的腿,额外的手臂,额外的腿,融合的手指,手指太多,长脖子"

infer/utils.py CHANGED Viewed

@@ -21,7 +21,8 @@
 # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
 # fine-tuning enabling code and other elements of the foregoing made publicly available
 # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
 import os
 import time
 import random
@@ -30,6 +31,7 @@ import torch
 from torch.cuda.amp import autocast, GradScaler
 from functools import wraps
 def seed_everything(seed):
     '''
         seed everthing
@@ -39,6 +41,7 @@ def seed_everything(seed):
     torch.manual_seed(seed)
     os.environ["PL_GLOBAL_SEED"] = str(seed)
 def timing_decorator(category: str):
     '''
         timing_decorator: record time
@@ -57,6 +60,7 @@ def timing_decorator(category: str):
         return wrapper
     return decorator
 def auto_amp_inference(func):
     '''
         with torch.cuda.amp.autocast()"
@@ -69,11 +73,13 @@ def auto_amp_inference(func):
         return output
     return wrapper
 def get_parameter_number(model):
     total_num = sum(p.numel() for p in model.parameters())
     trainable_num = sum(p.numel() for p in model.parameters() if p.requires_grad)
     return {'Total': total_num, 'Trainable': trainable_num}
 def set_parameter_grad_false(model):
     for p in model.parameters():
         p.requires_grad = False

 # optimizer states), machine-learning model code, inference-enabling code, training-enabling code,
 # fine-tuning enabling code and other elements of the foregoing made publicly available
 # by Tencent in accordance with TENCENT HUNYUAN COMMUNITY LICENSE AGREEMENT.
+import sys
+import io
 import os
 import time
 import random
 from torch.cuda.amp import autocast, GradScaler
 from functools import wraps
 def seed_everything(seed):
     '''
         seed everthing
     torch.manual_seed(seed)
     os.environ["PL_GLOBAL_SEED"] = str(seed)
 def timing_decorator(category: str):
     '''
         timing_decorator: record time
         return wrapper
     return decorator
 def auto_amp_inference(func):
     '''
         with torch.cuda.amp.autocast()"
         return output
     return wrapper
 def get_parameter_number(model):
     total_num = sum(p.numel() for p in model.parameters())
     trainable_num = sum(p.numel() for p in model.parameters() if p.requires_grad)
     return {'Total': total_num, 'Trainable': trainable_num}
 def set_parameter_grad_false(model):
     for p in model.parameters():
         p.requires_grad = False

infer/views_to_mesh.py CHANGED Viewed

@@ -47,11 +47,15 @@ class Views2Mesh():
             use_lite: lite version
             save_memory: cpu auto
         '''
-        self.mv23d_predictor = MV23DPredictor(mv23d_ckt_path, mv23d_cfg_path, device=device)
-        self.mv23d_predictor.model.eval()
-        self.order = [0, 1, 2, 3, 4, 5] if use_lite else [0, 2, 4, 5, 3, 1]
         self.device = device
         self.save_memory = save_memory
         set_parameter_grad_false(self.mv23d_predictor.model)
         print('view2mesh model', get_parameter_number(self.mv23d_predictor.model))
@@ -109,7 +113,6 @@ class Views2Mesh():
             do_texture_mapping = do_texture_mapping
         )
         torch.cuda.empty_cache()
-        return save_dir
 if __name__ == "__main__":

             use_lite: lite version
             save_memory: cpu auto
         '''
         self.device = device
         self.save_memory = save_memory
+        self.mv23d_predictor = MV23DPredictor(
+            mv23d_ckt_path,
+            mv23d_cfg_path,
+            device = "cpu" if save_memory else device
+        )
+        self.mv23d_predictor.model.eval()
+        self.order = [0, 1, 2, 3, 4, 5] if use_lite else [0, 2, 4, 5, 3, 1]
         set_parameter_grad_false(self.mv23d_predictor.model)
         print('view2mesh model', get_parameter_number(self.mv23d_predictor.model))
             do_texture_mapping = do_texture_mapping
         )
         torch.cuda.empty_cache()
 if __name__ == "__main__":

main.py CHANGED Viewed

@@ -24,16 +24,28 @@
 import os
 import warnings
-import torch
-from PIL import Image
 import argparse
-from infer import Text2Image, Removebg, Image2Views, Views2Mesh, GifRenderer
 warnings.simplefilter('ignore', category=UserWarning)
 warnings.simplefilter('ignore', category=FutureWarning)
 warnings.simplefilter('ignore', category=DeprecationWarning)
 def get_args():
     parser = argparse.ArgumentParser()
     parser.add_argument(
@@ -73,8 +85,8 @@ def get_args():
         "--gen_steps", default=50, type=int
     )
     parser.add_argument(
-        "--max_faces_num", default=80000, type=int,
-        help="max num of face, suggest 80000 for effect, 10000 for speed"
     )
     parser.add_argument(
         "--save_memory", default=False, action="store_true"
@@ -85,6 +97,13 @@ def get_args():
     parser.add_argument(
         "--do_render", default=False, action="store_true"
     )
     return parser.parse_args()
@@ -95,6 +114,7 @@ if __name__ == "__main__":
     assert args.text_prompt or args.image_prompt,        "Text and image can only be given to one"
     # init model
     rembg_model = Removebg()
     image_to_views_model = Image2Views(
         device=args.device,
@@ -116,9 +136,18 @@ if __name__ == "__main__":
             device = args.device,
             save_memory = args.save_memory
         )
-    if args.do_render:
         gif_renderer = GifRenderer(device=args.device)
     # ---- ----- ---- ---- ---- ----
     os.makedirs(args.save_folder, exist_ok=True)
@@ -136,7 +165,7 @@ if __name__ == "__main__":
     # stage 2, remove back ground
     res_rgba_pil = rembg_model(res_rgb_pil)
-    res_rgb_pil.save(os.path.join(args.save_folder, "img_nobg.png"))
     # stage 3, image to views
     (views_grid_pil, cond_img), view_pil_list = image_to_views_model(
@@ -155,10 +184,29 @@ if __name__ == "__main__":
         save_folder = args.save_folder,
         do_texture_mapping = args.do_texture_mapping
     )
-    #  stage 5, render gif
     if args.do_render:
         gif_renderer(
-            os.path.join(args.save_folder, 'mesh.obj'),
             gif_dst_path = os.path.join(args.save_folder, 'output.gif'),
         )

 import os
 import warnings
 import argparse
+import time
+from PIL import Image
+import torch
 warnings.simplefilter('ignore', category=UserWarning)
 warnings.simplefilter('ignore', category=FutureWarning)
 warnings.simplefilter('ignore', category=DeprecationWarning)
+from infer import Text2Image, Removebg, Image2Views, Views2Mesh, GifRenderer
+from third_party.mesh_baker import MeshBaker
+from third_party.check import check_bake_available
+try:
+    from third_party.mesh_baker import MeshBaker
+    assert check_bake_available()
+    BAKE_AVAILEBLE = True
+except Exception as err:
+    print(err)
+    print("import baking related fail, run without baking")
+    BAKE_AVAILEBLE = False
 def get_args():
     parser = argparse.ArgumentParser()
     parser.add_argument(
         "--gen_steps", default=50, type=int
     )
     parser.add_argument(
+        "--max_faces_num", default=90000, type=int,
+        help="max num of face, suggest 90000 for effect, 10000 for speed"
     )
     parser.add_argument(
         "--save_memory", default=False, action="store_true"
     parser.add_argument(
         "--do_render", default=False, action="store_true"
     )
+    parser.add_argument(
+        "--do_bake", default=False, action="store_true"
+    )
+    parser.add_argument(
+        "--bake_align_times", default=3, type=int,
+        help="align times between view image and mesh, suggest 1~6"
+    )
     return parser.parse_args()
     assert args.text_prompt or args.image_prompt,        "Text and image can only be given to one"
     # init model
+    st = time.time()
     rembg_model = Removebg()
     image_to_views_model = Image2Views(
         device=args.device,
             device = args.device,
             save_memory = args.save_memory
         )
+    if args.do_bake and BAKE_AVAILEBLE:
+        mesh_baker = MeshBaker(
+            device = args.device,
+            align_times = args.bake_align_times
+        )
+    if check_bake_available():
         gif_renderer = GifRenderer(device=args.device)
+    print(f"Init Models cost {time.time()-st}s")
     # ---- ----- ---- ---- ---- ----
     os.makedirs(args.save_folder, exist_ok=True)
     # stage 2, remove back ground
     res_rgba_pil = rembg_model(res_rgb_pil)
+    res_rgba_pil.save(os.path.join(args.save_folder, "img_nobg.png"))
     # stage 3, image to views
     (views_grid_pil, cond_img), view_pil_list = image_to_views_model(
         save_folder = args.save_folder,
         do_texture_mapping = args.do_texture_mapping
     )
+    # stage 5, baking
+    mesh_file_for_render = None
+    if args.do_bake and BAKE_AVAILEBLE:
+        mesh_file_for_render = mesh_baker(args.save_folder)
+    # stage 6, render gif
+    # todo fix: if init folder unclear, it maybe mistake rendering
     if args.do_render:
+        if mesh_file_for_render and os.path.exists(mesh_file_for_render):
+            mesh_file_for_render = mesh_file_for_render
+        elif os.path.exists(os.path.join(args.save_folder, 'view_1/bake/mesh.obj')):
+            mesh_file_for_render = os.path.join(args.save_folder, 'view_1/bake/mesh.obj')
+        elif os.path.exists(os.path.join(args.save_folder, 'view_0/bake/mesh.obj')):
+            mesh_file_for_render = os.path.join(args.save_folder, 'view_0/bake/mesh.obj')
+        elif os.path.exists(os.path.join(args.save_folder, 'mesh.obj')):
+            mesh_file_for_render = os.path.join(args.save_folder, 'mesh.obj')
+        else:
+            raise FileNotFoundError("mesh_file_for_render not found")
+        print("Rendering 3d file:", mesh_file_for_render)
         gif_renderer(
+            mesh_file_for_render,
             gif_dst_path = os.path.join(args.save_folder, 'output.gif'),
         )

requirements.txt CHANGED Viewed

@@ -22,3 +22,4 @@ git+https://github.com/facebookresearch/pytorch3d@stable
 git+https://github.com/NVlabs/nvdiffrast
 open3d
 ninja

 git+https://github.com/NVlabs/nvdiffrast
 open3d
 ninja
+roma

svrm/ldm/models/svrm.py CHANGED Viewed

@@ -46,7 +46,7 @@ from ..modules.rendering_neus.rasterize import NVDiffRasterizerContext
 from ..utils.ops import scale_tensor
 from ..util import count_params, instantiate_from_config
-from ..vis_util import render
 def unwrap_uv(v_pos, t_pos_idx):
@@ -58,7 +58,6 @@ def unwrap_uv(v_pos, t_pos_idx):
     indices = indices.astype(np.int64, casting="same_kind")
     return uvs, indices
 def uv_padding(image, hole_mask, uv_padding_size = 2):
     return cv2.inpaint(
         (image.detach().cpu().numpy() * 255).astype(np.uint8),
@@ -120,14 +119,16 @@ class SVRMModel(torch.nn.Module):
         out_dir = 'outputs/test'
     ):
         """
-        color_type: 0 for ray texture, 1 for vertices texture
         """
-        obj_vertext_path = os.path.join(out_dir, 'mesh_with_colors.obj')
-        obj_path = os.path.join(out_dir, 'mesh.obj')
-        obj_texture_path = os.path.join(out_dir, 'texture.png')
-        obj_mtl_path = os.path.join(out_dir, 'texture.mtl')
-        glb_path = os.path.join(out_dir, 'mesh.glb')
         st = time.time()
@@ -204,15 +205,13 @@ class SVRMModel(torch.nn.Module):
         mesh = trimesh.load_mesh(obj_vertext_path)
         print(f"=====> generate mesh with vertex shading time: {time.time() - st}")
         st = time.time()
         if not do_texture_mapping:
-            shutil.copy(obj_vertext_path, obj_path)
-            mesh.export(glb_path, file_type='glb')
-            return None
-        ##########  export texture  ########
         st = time.time()
@@ -238,12 +237,9 @@ class SVRMModel(torch.nn.Module):
         # Interpolate world space position
         gb_pos = ctx.interpolate_one(vtx_refine, rast[None, ...], faces_refine)[0][0]
         with torch.no_grad():
             gb_mask_pos_scale = scale_tensor(gb_pos.unsqueeze(0).view(1, -1, 3), (-1, 1), (-1, 1))
             tex_map = self.render.forward_points(cur_triplane, gb_mask_pos_scale)['rgb']
             tex_map = tex_map.float().squeeze(0)  # (0, 1)
             tex_map = tex_map.view((texture_res, texture_res, 3))
             img = uv_padding(tex_map, hole_mask)
@@ -257,7 +253,7 @@ class SVRMModel(torch.nn.Module):
             fid.write('newmtl material_0\n')
             fid.write("Ka 1.000 1.000 1.000\n")
             fid.write("Kd 1.000 1.000 1.000\n")
-            fid.write("Ks 0.000 0.000 0.000\n")
             fid.write("d 1.0\n")
             fid.write("illum 2\n")
             fid.write(f'map_Kd texture.png\n')
@@ -278,4 +274,5 @@ class SVRMModel(torch.nn.Module):
         mesh = trimesh.load_mesh(obj_path)
         mesh.export(glb_path, file_type='glb')
         print(f"=====> generate mesh with texture shading time: {time.time() - st}")

 from ..utils.ops import scale_tensor
 from ..util import count_params, instantiate_from_config
+from ..vis_util import render_func
 def unwrap_uv(v_pos, t_pos_idx):
     indices = indices.astype(np.int64, casting="same_kind")
     return uvs, indices
 def uv_padding(image, hole_mask, uv_padding_size = 2):
     return cv2.inpaint(
         (image.detach().cpu().numpy() * 255).astype(np.uint8),
         out_dir = 'outputs/test'
     ):
         """
+        do_texture_mapping: True for ray texture, False for vertices texture
         """
+        obj_vertext_path = os.path.join(out_dir, 'mesh_vertex_colors.obj')
+        if do_texture_mapping:
+            obj_path = os.path.join(out_dir, 'mesh.obj')
+            obj_texture_path = os.path.join(out_dir, 'texture.png')
+            obj_mtl_path = os.path.join(out_dir, 'texture.mtl')
+            glb_path = os.path.join(out_dir, 'mesh.glb')
         st = time.time()
         mesh = trimesh.load_mesh(obj_vertext_path)
         print(f"=====> generate mesh with vertex shading time: {time.time() - st}")
         st = time.time()
         if not do_texture_mapping:
+            return obj_vertext_path, None
+        ###########################################################
+        #-------------    export texture    -----------------------
+        ###########################################################
         st = time.time()
         # Interpolate world space position
         gb_pos = ctx.interpolate_one(vtx_refine, rast[None, ...], faces_refine)[0][0]
         with torch.no_grad():
             gb_mask_pos_scale = scale_tensor(gb_pos.unsqueeze(0).view(1, -1, 3), (-1, 1), (-1, 1))
             tex_map = self.render.forward_points(cur_triplane, gb_mask_pos_scale)['rgb']
             tex_map = tex_map.float().squeeze(0)  # (0, 1)
             tex_map = tex_map.view((texture_res, texture_res, 3))
             img = uv_padding(tex_map, hole_mask)
             fid.write('newmtl material_0\n')
             fid.write("Ka 1.000 1.000 1.000\n")
             fid.write("Kd 1.000 1.000 1.000\n")
+            fid.write("Ks 0.500 0.500 0.500\n")
             fid.write("d 1.0\n")
             fid.write("illum 2\n")
             fid.write(f'map_Kd texture.png\n')
         mesh = trimesh.load_mesh(obj_path)
         mesh.export(glb_path, file_type='glb')
         print(f"=====> generate mesh with texture shading time: {time.time() - st}")
+        return obj_path, glb_path

svrm/ldm/modules/attention.py CHANGED Viewed

@@ -246,8 +246,11 @@ class CrossAttention(nn.Module):
 class FlashAttention(nn.Module):
     def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64, dropout=0.):
         super().__init__()
-        print(f"Setting up {self.__class__.__name__}. Query dim is {query_dim}, context_dim is {context_dim} and using "
-              f"{heads} heads.")
         inner_dim = dim_head * heads
         context_dim = default(context_dim, query_dim)
         self.scale = dim_head ** -0.5
@@ -269,7 +272,12 @@ class FlashAttention(nn.Module):
         k = self.to_k(context).to(dtype)
         v = self.to_v(context).to(dtype)
         q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b n h d', h=h), (q, k, v)) # q is [b, 3079, 16, 64]
-        out = flash_attn_func(q, k, v, dropout_p=self.dropout, softmax_scale=None, causal=False, window_size=(-1, -1)) # out is same shape to q
         out = rearrange(out, 'b n h d -> b n (h d)', h=h)
         return self.to_out(out.float())
@@ -277,8 +285,11 @@ class MemoryEfficientCrossAttention(nn.Module):
     # https://github.com/MatthieuTPHR/diffusers/blob/d80b531ff8060ec1ea982b65a1b8df70f73aa67c/src/diffusers/models/attention.py#L223
     def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64, dropout=0.0):
         super().__init__()
-        print(f"Setting up {self.__class__.__name__}. Query dim is {query_dim}, context_dim is {context_dim} and using "
-              f"{heads} heads.")
         inner_dim = dim_head * heads
         context_dim = default(context_dim, query_dim)
@@ -327,10 +338,12 @@ class BasicTransformerBlock(nn.Module):
         super().__init__()
         self.disable_self_attn = disable_self_attn
         self.attn1 = CrossAttention(query_dim=dim, heads=n_heads, dim_head=d_head, dropout=dropout,
-                                    context_dim=context_dim if self.disable_self_attn else None)  # is a self-attention if not self.disable_self_attn
         self.ff = FeedForward(dim, dropout=dropout, glu=gated_ff)
         self.attn2 = CrossAttention(query_dim=dim, context_dim=context_dim,
-                                    heads=n_heads, dim_head=d_head, dropout=dropout)  # is self-attn if context is none
         self.norm1 = Fp32LayerNorm(dim)
         self.norm2 = Fp32LayerNorm(dim)
         self.norm3 = Fp32LayerNorm(dim)
@@ -451,7 +464,3 @@ class ImgToTriplaneTransformer(nn.Module):
         x = self.norm(x)
         return x

 class FlashAttention(nn.Module):
     def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64, dropout=0.):
         super().__init__()
+        # print(
+        #     f"Setting up {self.__class__.__name__}. Query dim is {query_dim}, "
+        #     "context_dim is {context_dim} and using "
+        #     f"{heads} heads."
+        # )
         inner_dim = dim_head * heads
         context_dim = default(context_dim, query_dim)
         self.scale = dim_head ** -0.5
         k = self.to_k(context).to(dtype)
         v = self.to_v(context).to(dtype)
         q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> b n h d', h=h), (q, k, v)) # q is [b, 3079, 16, 64]
+        out = flash_attn_func(q, k, v,
+                              dropout_p=self.dropout,
+                              softmax_scale=None,
+                              causal=False,
+                              window_size=(-1, -1)
+                             ) # out is same shape to q
         out = rearrange(out, 'b n h d -> b n (h d)', h=h)
         return self.to_out(out.float())
     # https://github.com/MatthieuTPHR/diffusers/blob/d80b531ff8060ec1ea982b65a1b8df70f73aa67c/src/diffusers/models/attention.py#L223
     def __init__(self, query_dim, context_dim=None, heads=8, dim_head=64, dropout=0.0):
         super().__init__()
+        # print(
+        #     f"Setting up {self.__class__.__name__}. Query dim is {query_dim}, "
+        #     "context_dim is {context_dim} and using "
+        #     f"{heads} heads."
+        # )
         inner_dim = dim_head * heads
         context_dim = default(context_dim, query_dim)
         super().__init__()
         self.disable_self_attn = disable_self_attn
         self.attn1 = CrossAttention(query_dim=dim, heads=n_heads, dim_head=d_head, dropout=dropout,
+                                    context_dim=context_dim if self.disable_self_attn else None)
+        # is a self-attention if not self.disable_self_attn
         self.ff = FeedForward(dim, dropout=dropout, glu=gated_ff)
         self.attn2 = CrossAttention(query_dim=dim, context_dim=context_dim,
+                                    heads=n_heads, dim_head=d_head, dropout=dropout)
+        # is self-attn if context is none
         self.norm1 = Fp32LayerNorm(dim)
         self.norm2 = Fp32LayerNorm(dim)
         self.norm3 = Fp32LayerNorm(dim)
         x = self.norm(x)
         return x

svrm/ldm/vis_util.py CHANGED Viewed

@@ -27,10 +27,10 @@ from pytorch3d.renderer import (
 )
-def render(
     obj_filename,
     elev=0,
-    azim=0,
     resolution=512,
     gif_dst_path='',
     n_views=120,
@@ -49,7 +49,7 @@ def render(
     mesh = load_objs_as_meshes([obj_filename], device=device)
     meshes = mesh.extend(n_views)
-    if gif_dst_path != '':
         elev = torch.linspace(elev, elev, n_views+1)[:-1]
         azim = torch.linspace(0, 360, n_views+1)[:-1]
@@ -76,16 +76,15 @@ def render(
     )
     images = renderer(meshes)
-    # single frame rendering
-    if gif_dst_path == '':
-        frame = images[0, ..., :3] if rgb else images[0, ...]
-        frame = Image.fromarray((frame.cpu().squeeze(0) * 255).numpy().astype("uint8"))
-        return frame
-    # orbit frames rendering
-    with imageio.get_writer(uri=gif_dst_path, mode='I', duration=1. / fps * 1000, loop=0) as writer:
-        for i in range(n_views):
-            frame = images[i, ..., :3] if rgb else images[i, ...]
-            frame = Image.fromarray((frame.cpu().squeeze(0) * 255).numpy().astype("uint8"))
-            writer.append_data(frame)
-        return gif_dst_path

 )
+def render_func(
     obj_filename,
     elev=0,
+    azim=None,
     resolution=512,
     gif_dst_path='',
     n_views=120,
     mesh = load_objs_as_meshes([obj_filename], device=device)
     meshes = mesh.extend(n_views)
+    if azim is None:
         elev = torch.linspace(elev, elev, n_views+1)[:-1]
         azim = torch.linspace(0, 360, n_views+1)[:-1]
     )
     images = renderer(meshes)
+    if gif_dst_path != '':
+        with imageio.get_writer(uri=gif_dst_path, mode='I', duration=1. / fps * 1000, loop=0) as writer:
+            for i in range(n_views):
+                frame = images[i, ..., :3] if rgb else images[i, ...]
+                frame = Image.fromarray((frame.cpu().squeeze(0) * 255).numpy().astype("uint8"))
+                writer.append_data(frame)
+    frame = images[..., :3] if rgb else images
+    frames = [Image.fromarray((fra.cpu().squeeze(0) * 255).numpy().astype("uint8")) for fra in frame]
+    return frames

svrm/predictor.py CHANGED Viewed

@@ -33,7 +33,7 @@ from omegaconf import OmegaConf
 from torchvision import transforms
 from safetensors.torch import save_file, load_file
 from .ldm.util import instantiate_from_config
-from .ldm.vis_util import render
 class MV23DPredictor(object):
     def __init__(self, ckpt_path, cfg_path, elevation=15, number_view=60,
@@ -46,9 +46,7 @@ class MV23DPredictor(object):
         self.elevation_list = [0, 0, 0, 0, 0, 0, 0]
         self.azimuth_list = [0, 60, 120, 180, 240, 300, 0]
-        st = time.time()
         self.model = self.init_model(ckpt_path, cfg_path)
-        print(f"=====> mv23d model init time: {time.time() - st}")
         self.input_view_transform = transforms.Compose([
             transforms.Resize(504, interpolation=Image.BICUBIC),

 from torchvision import transforms
 from safetensors.torch import save_file, load_file
 from .ldm.util import instantiate_from_config
+from .ldm.vis_util import render_func
 class MV23DPredictor(object):
     def __init__(self, ckpt_path, cfg_path, elevation=15, number_view=60,
         self.elevation_list = [0, 0, 0, 0, 0, 0, 0]
         self.azimuth_list = [0, 60, 120, 180, 240, 300, 0]
         self.model = self.init_model(ckpt_path, cfg_path)
         self.input_view_transform = transforms.Compose([
             transforms.Resize(504, interpolation=Image.BICUBIC),

third_party/check.py ADDED Viewed

	@@ -0,0 +1,25 @@

+import os
+import sys
+import io
+def check_bake_available():
+    is_ok = os.path.exists("./third_party/weights/DUSt3R_ViTLarge_BaseDecoder_512_dpt/model.safetensors")
+    is_ok = is_ok and os.path.exists("./third_party/dust3r")
+    is_ok = is_ok and os.path.exists("./third_party/dust3r/dust3r")
+    is_ok = is_ok and os.path.exists("./third_party/dust3r/croco/models")
+    if is_ok:
+        print("Baking is avaliable")
+        print("Baking is avaliable")
+        print("Baking is avaliable")
+    else:
+        print("Baking is unavailable, please download related files in README")
+        print("Baking is unavailable, please download related files in README")
+        print("Baking is unavailable, please download related files in README")
+    return is_ok
+if __name__ == "__main__":
+    check_bake_available()

third_party/dust3r_utils.py ADDED Viewed

	@@ -0,0 +1,366 @@

+import sys
+import io
+import os
+import cv2
+import math
+import numpy as np
+from scipy.signal import medfilt
+from scipy.spatial import KDTree
+from matplotlib import pyplot as plt
+from PIL import Image
+from dust3r.inference import inference
+from dust3r.utils.image import load_images# , resize_images
+from dust3r.image_pairs import make_pairs
+from dust3r.cloud_opt import global_aligner, GlobalAlignerMode
+from dust3r.utils.geometry import find_reciprocal_matches, xy_grid
+from third_party.utils.camera_utils import remap_points
+from third_party.utils.img_utils import rgba_to_rgb, resize_with_aspect_ratio
+from third_party.utils.img_utils import compute_img_diff
+from PIL.ImageOps import exif_transpose
+import torchvision.transforms as tvf
+ImgNorm = tvf.Compose([tvf.ToTensor(), tvf.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
+def suppress_output(func):
+    def wrapper(*args, **kwargs):
+        original_stdout = sys.stdout
+        original_stderr = sys.stderr
+        sys.stdout = io.StringIO()
+        sys.stderr = io.StringIO()
+        try:
+            return func(*args, **kwargs)
+        finally:
+            sys.stdout = original_stdout
+            sys.stderr = original_stderr
+    return wrapper
+def _resize_pil_image(img, long_edge_size):
+    S = max(img.size)
+    if S > long_edge_size:
+        interp = Image.LANCZOS
+    elif S <= long_edge_size:
+        interp = Image.BICUBIC
+    new_size = tuple(int(round(x*long_edge_size/S)) for x in img.size)
+    return img.resize(new_size, interp)
+def resize_images(imgs_list, size, square_ok=False):
+    """ open and convert all images in a list or folder to proper input format for DUSt3R
+    """
+    imgs = []
+    for img in imgs_list:
+        img = exif_transpose(Image.fromarray(img)).convert('RGB')
+        W1, H1 = img.size
+        if size == 224:
+            # resize short side to 224 (then crop)
+            img = _resize_pil_image(img, round(size * max(W1/H1, H1/W1)))
+        else:
+            # resize long side to 512
+            img = _resize_pil_image(img, size)
+        W, H = img.size
+        cx, cy = W//2, H//2
+        if size == 224:
+            half = min(cx, cy)
+            img = img.crop((cx-half, cy-half, cx+half, cy+half))
+        else:
+            halfw, halfh = ((2*cx)//16)*8, ((2*cy)//16)*8
+            if not (square_ok) and W == H:
+                halfh = 3*halfw/4
+            img = img.crop((cx-halfw, cy-halfh, cx+halfw, cy+halfh))
+        W2, H2 = img.size
+        imgs.append(dict(img=ImgNorm(img)[None], true_shape=np.int32(
+            [img.size[::-1]]), idx=len(imgs), instance=str(len(imgs))))
+    return imgs
+@suppress_output
+def infer_match(images, model, vis=False, niter=300, lr=0.01, schedule='cosine', device="cuda:0"):
+    batch_size = 1
+    schedule = 'cosine'
+    lr = 0.01
+    niter = 300
+    images_packed = resize_images(images, size=512, square_ok=True)
+    # images_packed = images
+    pairs = make_pairs(images_packed, scene_graph='complete', prefilter=None, symmetrize=True)
+    output = inference(pairs, model, device, batch_size=batch_size, verbose=False)
+    scene = global_aligner(output, device=device, mode=GlobalAlignerMode.PointCloudOptimizer)
+    loss = scene.compute_global_alignment(init="mst", niter=niter, schedule=schedule, lr=lr)
+    # retrieve useful values from scene:
+    imgs = scene.imgs
+    # focals = scene.get_focals()
+    # poses = scene.get_im_poses()
+    pts3d = scene.get_pts3d()
+    confidence_masks = scene.get_masks()
+    # visualize reconstruction
+    # scene.show()
+    # find 2D-2D matches between the two images
+    pts2d_list, pts3d_list = [], []
+    for i in range(2):
+        conf_i = confidence_masks[i].cpu().numpy()
+        pts2d_list.append(xy_grid(*imgs[i].shape[:2][::-1])[conf_i])  # imgs[i].shape[:2] = (H, W)
+        pts3d_list.append(pts3d[i].detach().cpu().numpy()[conf_i])
+        if pts3d_list[-1].shape[0] == 0:
+            return np.zeros((0, 2)), np.zeros((0, 2))
+    reciprocal_in_P2, nn2_in_P1, num_matches = find_reciprocal_matches(*pts3d_list)
+    matches_im1 = pts2d_list[1][reciprocal_in_P2]
+    matches_im0 = pts2d_list[0][nn2_in_P1][reciprocal_in_P2]
+    # visualize a few matches
+    if vis == True:
+        print(f'found {num_matches} matches')
+        n_viz = 20
+        match_idx_to_viz = np.round(np.linspace(0, num_matches - 1, n_viz)).astype(int)
+        viz_matches_im0, viz_matches_im1 = matches_im0[match_idx_to_viz], matches_im1[match_idx_to_viz]
+        H0, W0, H1, W1 = *imgs[0].shape[:2], *imgs[1].shape[:2]
+        img0 = np.pad(imgs[0], ((0, max(H1 - H0, 0)), (0, 0), (0, 0)), 'constant', constant_values=0)
+        img1 = np.pad(imgs[1], ((0, max(H0 - H1, 0)), (0, 0), (0, 0)), 'constant', constant_values=0)
+        img = np.concatenate((img0, img1), axis=1)
+        plt.figure()
+        plt.imshow(img)
+        cmap = plt.get_cmap('jet')
+        for i in range(n_viz):
+            (x0, y0), (x1, y1) = viz_matches_im0[i].T, viz_matches_im1[i].T
+            plt.plot([x0, x1 + W0], [y0, y1], '-+', color=cmap(i / (n_viz - 1)), scalex=False, scaley=False)
+        plt.show(block=True)
+    matches_im0 = remap_points(images[0].shape, matches_im0)
+    matches_im1 = remap_points(images[1].shape, matches_im1)
+    return matches_im0, matches_im1
+def point_transform(H, pt):
+    """
+    @param: H is homography matrix of dimension (3x3)
+    @param: pt is the (x, y) point to be transformed
+    Return:
+            returns a transformed point ptrans = H*pt.
+    """
+    a = H[0, 0] * pt[0] + H[0, 1] * pt[1] + H[0, 2]
+    b = H[1, 0] * pt[0] + H[1, 1] * pt[1] + H[1, 2]
+    c = H[2, 0] * pt[0] + H[2, 1] * pt[1] + H[2, 2]
+    return [a / c, b / c]
+def points_transform(H, pt_x, pt_y):
+    """
+    @param: H is homography matrix of dimension (3x3)
+    @param: pt is the (x, y) point to be transformed
+    Return:
+            returns a transformed point ptrans = H*pt.
+    """
+    a = H[0, 0] * pt_x + H[0, 1] * pt_y + H[0, 2]
+    b = H[1, 0] * pt_x + H[1, 1] * pt_y + H[1, 2]
+    c = H[2, 0] * pt_x + H[2, 1] * pt_y + H[2, 2]
+    return (a / c, b / c)
+def motion_propagate(old_points, new_points, old_size, new_size, H_size=(21, 21)):
+    """
+    @param: old_points are points in old_frame that are
+            matched feature points with new_frame
+    @param: new_points are points in new_frame that are
+            matched feature points with old_frame
+    @param: old_frame is the frame to which
+            motion mesh needs to be obtained
+    @param: H is the homography between old and new points
+    Return:
+            returns a motion mesh in x-direction
+            and y-direction for old_frame
+    """
+    # spreads motion over the mesh for the old_frame
+    x_motion = np.zeros(H_size)
+    y_motion = np.zeros(H_size)
+    mesh_x_num, mesh_y_num = H_size[0], H_size[1]
+    pixels_x, pixels_y = (old_size[1]) / (mesh_x_num - 1), (old_size[0]) / (mesh_y_num - 1)
+    radius = max(pixels_x, pixels_y) * 5
+    sigma = radius / 3.0
+    H_global = None
+    if old_points.shape[0] > 3:
+        # pre-warping with global homography
+        H_global, _ = cv2.findHomography(old_points, new_points, cv2.RANSAC)
+    if H_global is None:
+        old_tmp = np.array([[0, 0], [0, old_size[0]], [old_size[1], 0], [old_size[1], old_size[0]]])
+        new_tmp = np.array([[0, 0], [0, new_size[0]], [new_size[1], 0], [new_size[1], new_size[0]]])
+        H_global, _ = cv2.findHomography(old_tmp, new_tmp, cv2.RANSAC)
+    for i in range(mesh_x_num):
+        for j in range(mesh_y_num):
+            pt = [pixels_x * i, pixels_y * j]
+            ptrans = point_transform(H_global, pt)
+            x_motion[i, j] = ptrans[0]
+            y_motion[i, j] = ptrans[1]
+    # disturbute feature motion vectors
+    weighted_move_x = np.zeros(H_size)
+    weighted_move_y = np.zeros(H_size)
+    # 构建 KDTree
+    tree = KDTree(old_points)
+    # 计算权重和移动值
+    for i in range(mesh_x_num):
+        for j in range(mesh_y_num):
+            vertex = [pixels_x * i, pixels_y * j]
+            neighbor_indices = tree.query_ball_point(vertex, radius, workers=-1)
+            if len(neighbor_indices) > 0:
+                pts = old_points[neighbor_indices]
+                sts = new_points[neighbor_indices]
+                ptrans_x, ptrans_y = points_transform(H_global, pts[:, 0], pts[:, 1])
+                moves_x = sts[:, 0] - ptrans_x
+                moves_y = sts[:, 1] - ptrans_y
+                dists = np.sqrt((vertex[0] - pts[:, 0]) ** 2 + (vertex[1] - pts[:, 1]) ** 2)
+                weights_x = np.exp(-(dists ** 2) / (2 * sigma ** 2))
+                weights_y = np.exp(-(dists ** 2) / (2 * sigma ** 2))
+                weighted_move_x[i, j] = np.sum(weights_x * moves_x) / (np.sum(weights_x) + 0.1)
+                weighted_move_y[i, j] = np.sum(weights_y * moves_y) / (np.sum(weights_y) + 0.1)
+    x_motion_mesh = x_motion + weighted_move_x
+    y_motion_mesh = y_motion + weighted_move_y
+    '''
+    # apply median filter (f-1) on obtained motion for each vertex
+    x_motion_mesh = np.zeros((mesh_x_num, mesh_y_num), dtype=float)
+    y_motion_mesh = np.zeros((mesh_x_num, mesh_y_num), dtype=float)
+    for key in x_motion.keys():
+        try:
+            temp_x_motion[key].sort()
+            x_motion_mesh[key] = x_motion[key]+temp_x_motion[key][len(temp_x_motion[key])//2]
+        except KeyError:
+            x_motion_mesh[key] = x_motion[key]
+        try:
+            temp_y_motion[key].sort()
+            y_motion_mesh[key] = y_motion[key]+temp_y_motion[key][len(temp_y_motion[key])//2]
+        except KeyError:
+            y_motion_mesh[key] = y_motion[key]
+    # apply second median filter (f-2) over the motion mesh for outliers
+    #x_motion_mesh = medfilt(x_motion_mesh, kernel_size=[3, 3])
+    #y_motion_mesh = medfilt(y_motion_mesh, kernel_size=[3, 3])
+    '''
+    return x_motion_mesh, y_motion_mesh
+def mesh_warp_points(points, x_motion_mesh, y_motion_mesh, img_size):
+    ptrans = []
+    mesh_x_num, mesh_y_num = x_motion_mesh.shape
+    pixels_x, pixels_y = (img_size[1]) / (mesh_x_num - 1), (img_size[0]) / (mesh_y_num - 1)
+    for pt in points:
+        i = int(pt[0] // pixels_x)
+        j = int(pt[1] // pixels_y)
+        src = [[i * pixels_x, j * pixels_y],
+               [(i + 1) * pixels_x, j * pixels_y],
+               [i * pixels_x, (j + 1) * pixels_y],
+               [(i + 1) * pixels_x, (j + 1) * pixels_y]]
+        src = np.asarray(src)
+        dst = [[x_motion_mesh[i, j], y_motion_mesh[i, j]],
+               [x_motion_mesh[i + 1, j], y_motion_mesh[i + 1, j]],
+               [x_motion_mesh[i, j + 1], y_motion_mesh[i, j + 1]],
+               [x_motion_mesh[i + 1, j + 1], y_motion_mesh[i + 1, j + 1]]]
+        dst = np.asarray(dst)
+        H, _ = cv2.findHomography(src, dst, cv2.RANSAC)
+        x, y = points_transform(H, pt[0], pt[1])
+        ptrans.append([x, y])
+    return np.array(ptrans)
+def mesh_warp_frame(frame, x_motion_mesh, y_motion_mesh, resize):
+    """
+    @param: frame is the current frame
+    @param: x_motion_mesh is the motion_mesh to
+            be warped on frame along x-direction
+    @param: y_motion_mesh is the motion mesh to
+            be warped on frame along y-direction
+    @param: resize is the desired output size (tuple of width, height)
+    Returns:
+            returns a mesh warped frame according
+            to given motion meshes x_motion_mesh,
+            y_motion_mesh, resized to the specified size
+    """
+    map_x = np.zeros(resize, np.float32)
+    map_y = np.zeros(resize, np.float32)
+    mesh_x_num, mesh_y_num = x_motion_mesh.shape
+    pixels_x, pixels_y = (resize[1]) / (mesh_x_num - 1), (resize[0]) / (mesh_y_num - 1)
+    for i in range(mesh_x_num - 1):
+        for j in range(mesh_y_num - 1):
+            src = [[i * pixels_x, j * pixels_y],
+                   [(i + 1) * pixels_x, j * pixels_y],
+                   [i * pixels_x, (j + 1) * pixels_y],
+                   [(i + 1) * pixels_x, (j + 1) * pixels_y]]
+            src = np.asarray(src)
+            dst = [[x_motion_mesh[i, j], y_motion_mesh[i, j]],
+                   [x_motion_mesh[i + 1, j], y_motion_mesh[i + 1, j]],
+                   [x_motion_mesh[i, j + 1], y_motion_mesh[i, j + 1]],
+                   [x_motion_mesh[i + 1, j + 1], y_motion_mesh[i + 1, j + 1]]]
+            dst = np.asarray(dst)
+            H, _ = cv2.findHomography(src, dst, cv2.RANSAC)
+            start_x = math.ceil(pixels_x * i)
+            end_x = math.ceil(pixels_x * (i + 1))
+            start_y = math.ceil(pixels_y * j)
+            end_y = math.ceil(pixels_y * (j + 1))
+            x, y = np.meshgrid(range(start_x, end_x), range(start_y, end_y), indexing='ij')
+            map_x[y, x], map_y[y, x] = points_transform(H, x, y)
+    # deforms mesh and directly outputs the resized frame
+    resized_frame = cv2.remap(frame, map_x, map_y,
+                              interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT,
+                              borderValue=(255, 255, 255))
+    return resized_frame
+def infer_warp_mesh_img(src, dst, model, vis=False):
+    if isinstance(src, str):
+        image1 = cv2.imread(src,   cv2.IMREAD_UNCHANGED)
+        image2 = cv2.imread(dst, cv2.IMREAD_UNCHANGED)
+        image1 = cv2.cvtColor(image1, cv2.COLOR_BGR2RGB)
+        image2 = cv2.cvtColor(image2, cv2.COLOR_BGR2RGB)
+    elif isinstance(src, Image.Image):
+        image1 = np.array(src)
+        image2 = np.array(dst)
+    else:
+        assert isinstance(src, np.ndarray)
+    image1 = rgba_to_rgb(image1)
+    image2 = rgba_to_rgb(image2)
+    image1_padded = resize_with_aspect_ratio(image1, image2)
+    resized_image1 = cv2.resize(image1_padded, (image2.shape[1], image2.shape[0]), interpolation=cv2.INTER_AREA)
+    matches_im0, matches_im1 = infer_match([resized_image1, image2], model, vis=vis)
+    matches_im0 = matches_im0 * image1_padded.shape[0] / resized_image1.shape[0]
+    # print('Estimate Mesh Grid')
+    mesh_x, mesh_y = motion_propagate(matches_im1, matches_im0, image2.shape[:2], image1_padded.shape[:2])
+    aligned_image = mesh_warp_frame(image1_padded, mesh_x, mesh_y, (image2.shape[0], image2.shape[1]))
+    matches_im0_from_im1 = mesh_warp_points(matches_im1, mesh_x, mesh_y, (image2.shape[1], image2.shape[0]))
+    info = compute_img_diff(aligned_image, image2, matches_im0, matches_im0_from_im1, vis=vis)
+    return aligned_image, info

third_party/gen_baking.py ADDED Viewed

	@@ -0,0 +1,288 @@

+import os, sys, time
+from typing import List, Optional
+from iopath.common.file_io import PathManager
+import cv2
+import imageio
+import numpy as np
+from PIL import Image
+import matplotlib.pyplot as plt
+import torch
+import torch.nn.functional as F
+from torchvision import transforms
+import trimesh
+from pytorch3d.io import load_objs_as_meshes, load_obj, save_obj
+from pytorch3d.ops import interpolate_face_attributes
+from pytorch3d.common.datatypes import Device
+from pytorch3d.structures import Meshes
+from pytorch3d.renderer import (
+    look_at_view_transform,
+    FoVPerspectiveCameras,
+    PointLights,
+    DirectionalLights,
+    AmbientLights,
+    Materials,
+    RasterizationSettings,
+    MeshRenderer,
+    MeshRasterizer,
+    SoftPhongShader,
+    TexturesUV,
+    TexturesVertex,
+    camera_position_from_spherical_angles,
+    BlendParams,
+)
+def erode_mask(src_mask, p=1 / 20.0):
+    monoMaskImage = cv2.split(src_mask)[0]
+    br = cv2.boundingRect(monoMaskImage)
+    k = int(min(br[2], br[3]) * p)
+    kernel = np.ones((k, k), dtype=np.uint8)
+    dst_mask = cv2.erode(src_mask, kernel, 1)
+    return dst_mask
+def load_objs_as_meshes_fast(
+    verts,
+    faces,
+    aux,
+    device: Optional[Device] = None,
+    load_textures: bool = True,
+    create_texture_atlas: bool = False,
+    texture_atlas_size: int = 4,
+    texture_wrap: Optional[str] = "repeat",
+    path_manager: Optional[PathManager] = None,
+):
+    tex = None
+    if create_texture_atlas:
+        # TexturesAtlas type
+        tex = TexturesAtlas(atlas=[aux.texture_atlas.to(device)])
+    else:
+        # TexturesUV type
+        tex_maps = aux.texture_images
+        if tex_maps is not None and len(tex_maps) > 0:
+            verts_uvs = aux.verts_uvs.to(device)  # (V, 2)
+            faces_uvs = faces.textures_idx.to(device)  # (F, 3)
+            image = list(tex_maps.values())[0].to(device)[None]
+            tex = TexturesUV(verts_uvs=[verts_uvs], faces_uvs=[faces_uvs], maps=image)
+    mesh = Meshes( verts=[verts.to(device)], faces=[faces.verts_idx.to(device)], textures=tex)
+    return mesh
+def get_triangle_to_triangle(tri_1, tri_2, img_refined):
+    '''
+        args:
+            tri_1:
+            tri_2:
+    '''
+    r1 = cv2.boundingRect(tri_1)
+    r2 = cv2.boundingRect(tri_2)
+    tri_1_cropped = []
+    tri_2_cropped = []
+    for i in range(0, 3):
+        tri_1_cropped.append(((tri_1[i][1] - r1[1]), (tri_1[i][0] - r1[0])))
+        tri_2_cropped.append(((tri_2[i][1] - r2[1]), (tri_2[i][0] - r2[0])))
+    trans = cv2.getAffineTransform(np.float32(tri_1_cropped), np.float32(tri_2_cropped))
+    img_1_cropped = np.float32(img_refined[r1[0]:r1[0] + r1[2], r1[1]:r1[1] + r1[3]])
+    mask = np.zeros((r2[2], r2[3], 3), dtype=np.float32)
+    cv2.fillConvexPoly(mask, np.int32(tri_2_cropped), (1.0, 1.0, 1.0), 16, 0)
+    img_2_cropped = cv2.warpAffine(
+        img_1_cropped, trans, (r2[3], r2[2]), None,
+        flags = cv2.INTER_LINEAR,
+        borderMode = cv2.BORDER_REFLECT_101
+    )
+    return mask, img_2_cropped, r2
+def back_projection(
+    obj_file,
+    init_texture_file,
+    front_view_file,
+    dst_dir,
+    render_resolution=512,
+    uv_resolution=600,
+    normalThreshold=0.3, # 0.3
+    rgb_thresh=820, # 520
+    views=None,
+    camera_dist=1.5,
+    erode_scale=1/100.0,
+    device="cuda:0"
+):
+    # obj_file: 带有uv的obj
+    # init_texture_file： 初始展开的uv贴图
+    # render_resolution 正面视角渲染分辨率
+    # uv_resolution 贴图分辨率
+    # thres：normal threshold
+    os.makedirs(dst_dir, exist_ok=True)
+    if isinstance(front_view_file, str):
+        src = np.array(Image.open(front_view_file).convert("RGB"))
+    elif isinstance(front_view_file, Image.Image):
+        src = np.array(front_view_file.convert("RGB"))
+    else:
+        raise "need file_path or pil"
+    image_size = (render_resolution, render_resolution)
+    init_texture = Image.open(init_texture_file)
+    init_texture = init_texture.convert("RGB")
+    # init_texture = init_texture.resize((uv_resolution, uv_resolution))
+    init_texture = np.array(init_texture).astype(np.float32)
+    print("load obj", obj_file)
+    verts, faces, aux = load_obj(obj_file, device=device)
+    mesh = load_objs_as_meshes_fast(verts, faces, aux, device=device)
+    t0 = time.time()
+    verts_uvs = aux.verts_uvs
+    triangle_uvs = verts_uvs[faces.textures_idx]
+    triangle_uvs = torch.cat([
+        ((1 - triangle_uvs[..., 1]) * uv_resolution).unsqueeze(2),
+        (triangle_uvs[..., 0] * uv_resolution).unsqueeze(2),
+    ], dim=-1)
+    triangle_uvs = np.clip(np.round(np.float32(triangle_uvs.cpu())).astype(np.int64), 0, uv_resolution-1)
+    # import ipdb;ipdb.set_trace()
+    R0, T0 = look_at_view_transform(camera_dist, views[0][0], views[0][1])
+    cameras = FoVPerspectiveCameras(device=device, R=R0, T=T0, fov=49.1)
+    camera_normal = camera_position_from_spherical_angles(1, views[0][0], views[0][1]).to(device)
+    screen_coords = cameras.transform_points_screen(verts, image_size=image_size)[:, :2]
+    screen_coords = torch.cat([screen_coords[..., 1, None], screen_coords[..., 0, None]], dim=-1)
+    triangle_screen_coords = np.round(np.float32(screen_coords[faces.verts_idx].cpu())) # numpy.ndarray (90000, 3, 2)
+    triangle_screen_coords = np.clip(triangle_screen_coords.astype(np.int64), 0, render_resolution-1)
+    renderer = MeshRenderer(
+        rasterizer=MeshRasterizer(
+            cameras=cameras,
+            raster_settings= RasterizationSettings(
+                image_size=image_size,
+                blur_radius=0.0,
+                faces_per_pixel=1,
+            ),
+        ),
+        shader=SoftPhongShader(
+            device=device,
+            cameras=cameras,
+            lights= AmbientLights(device=device),
+            blend_params=BlendParams(background_color=(1.0, 1.0, 1.0)),
+        )
+    )
+    dst = renderer(mesh)
+    dst = (dst[..., :3] * 255).squeeze(0).cpu().numpy().astype(np.uint8)
+    src_mask = np.ones((src.shape[0], src.shape[1]), dst.dtype)
+    ids = np.where(dst.sum(-1) > 253 * 3)
+    ids2 = np.where(src.sum(-1) > 250 * 3)
+    src_mask[ids[0], ids[1]] = 0
+    src_mask[ids2[0], ids2[1]] = 0
+    src_mask = (src_mask > 0).astype(np.uint8) * 255
+    monoMaskImage = cv2.split(src_mask)[0] # reducing the mask to a monochrome
+    br = cv2.boundingRect(monoMaskImage) # bounding rect (x,y,width,height)
+    center = (br[0] + br[2] // 2, br[1] + br[3] // 2)
+    # seamlessClone
+    try:
+        images = cv2.seamlessClone(src, dst, src_mask, center, cv2.NORMAL_CLONE) # more qingxi
+        # images = cv2.seamlessClone(src, dst, src_mask, center, cv2.MIXED_CLONE)
+    except Exception as err:
+        print(f"\n\n Warning seamlessClone error: {err} \n\n")
+        images = src
+    Image.fromarray(src_mask).save(os.path.join(dst_dir, 'mask.jpeg'))
+    Image.fromarray(src).save(os.path.join(dst_dir, 'src.jpeg'))
+    Image.fromarray(dst).save(os.path.join(dst_dir, 'dst.jpeg'))
+    Image.fromarray(images).save(os.path.join(dst_dir, 'blend.jpeg'))
+    fragments_scaled = renderer.rasterizer(mesh)  # pytorch3d.renderer.mesh.rasterizer.Fragments
+    faces_covered = fragments_scaled.pix_to_face.unique()[1:] # torch.Tensor torch.Size([30025])
+    face_normals = mesh.faces_normals_packed().to(device) # torch.Tensor torch.Size([90000, 3]) cuda:0
+    # faces:              pytorch3d.io.obj_io.Faces
+    # faces.textures_idx: torch.Tensor torch.Size([90000, 3])
+    # verts_uvs:          torch.Tensor torch.Size([49554, 2])
+    triangle_uvs = verts_uvs[faces.textures_idx]
+    triangle_uvs = [
+        ((1 - triangle_uvs[..., 1]) * uv_resolution).unsqueeze(2),
+        (triangle_uvs[..., 0] * uv_resolution).unsqueeze(2),
+    ]
+    triangle_uvs = torch.cat(triangle_uvs, dim=-1) # numpy.ndarray (90000, 3, 2)
+    triangle_uvs = np.clip(np.round(np.float32(triangle_uvs.cpu())).astype(np.int64), 0, uv_resolution-1)
+    t0 = time.time()
+    SOFT_NORM = True # process big angle-diff face, true:flase? coeff:skip
+    for k in faces_covered:
+        # todo: accelerate this for-loop
+        # if cosine between face-camera is too low, skip current face baking
+        face_normal = face_normals[k]
+        cosine = torch.sum((face_normal * camera_normal) ** 2)
+        if not SOFT_NORM and cosine < normalThreshold: continue
+        # if coord in screen out of subject, skip current face baking
+        out_of_subject = src_mask[triangle_screen_coords[k][0][0], triangle_screen_coords[k][0][1]]==0
+        if out_of_subject: continue
+        coeff, img_2_cropped, r2 = get_triangle_to_triangle(triangle_screen_coords[k], triangle_uvs[k], images)
+        # if color difference between new-old, skip current face baking
+        err = np.abs(init_texture[r2[0]:r2[0]+r2[2], r2[1]:r2[1]+r2[3]]- img_2_cropped)
+        err = (err * coeff).sum(-1)
+        # print(err.shape, np.max(err))
+        if (np.max(err) > rgb_thresh): continue
+        color_for_debug = None
+        # if (np.max(err) > 400): color_for_debug = [255, 0, 0]
+        # if (np.max(err) > 450): color_for_debug = [0, 255, 0]
+        # if (np.max(err) > 500): color_for_debug = [0, 0, 255]
+        coeff = coeff.clip(0, 1)
+        if SOFT_NORM:
+            coeff *= ((cosine.detach().cpu().numpy() - normalThreshold) / normalThreshold).clip(0,1)
+        coeff *= (((rgb_thresh - err[...,None]) / rgb_thresh)**0.4).clip(0,1)
+        if color_for_debug is None:
+            init_texture[r2[0]:r2[0]+r2[2], r2[1]:r2[1]+r2[3]] = \
+                init_texture[r2[0]:r2[0]+r2[2], r2[1]:r2[1]+r2[3]] * ((1.0,1.0,1.0)-coeff) + img_2_cropped * coeff
+        else:
+            init_texture[r2[0]:r2[0]+r2[2], r2[1]:r2[1]+r2[3]] = color_for_debug
+    print(f'View baking time: {time.time() - t0}')
+    bake_dir = os.path.join(dst_dir, 'bake')
+    os.makedirs(bake_dir, exist_ok=True)
+    os.system(f'cp {obj_file} {bake_dir}')
+    textute_img = Image.fromarray(init_texture.astype(np.uint8))
+    textute_img.save(os.path.join(bake_dir, init_texture_file.split("/")[-1]))
+    mtl_dir = obj_file.replace('.obj', '.mtl')
+    if not os.path.exists(mtl_dir): mtl_dir = obj_file.replace("mesh.obj" ,"material.mtl")
+    if not os.path.exists(mtl_dir): mtl_dir = obj_file.replace("mesh.obj" ,"texture.mtl")
+    if not os.path.exists(mtl_dir): import ipdb;ipdb.set_trace()
+    os.system(f'cp {mtl_dir} {bake_dir}')
+    # convert .obj to .glb file
+    new_obj_pth = os.path.join(bake_dir, obj_file.split('/')[-1])
+    new_glb_path = new_obj_pth.replace('.obj', '.glb')
+    mesh = trimesh.load_mesh(new_obj_pth)
+    mesh.export(new_glb_path, file_type='glb')

third_party/mesh_baker.py ADDED Viewed

	@@ -0,0 +1,142 @@

+import os, sys, time, traceback
+print("sys path insert", os.path.join(os.path.dirname(__file__), "dust3r"))
+sys.path.insert(0, os.path.join(os.path.dirname(__file__), "dust3r"))
+import cv2
+import numpy as np
+from PIL import Image, ImageSequence
+from einops import rearrange
+import torch
+from infer.utils import seed_everything, timing_decorator
+from infer.utils import get_parameter_number, set_parameter_grad_false
+from dust3r.inference import inference
+from dust3r.model import AsymmetricCroCo3DStereo
+from third_party.gen_baking import back_projection
+from third_party.dust3r_utils import infer_warp_mesh_img
+from svrm.ldm.vis_util import render_func
+class MeshBaker:
+    def __init__(
+        self,
+        align_model = "third_party/weights/DUSt3R_ViTLarge_BaseDecoder_512_dpt",
+        device = "cuda:0",
+        align_times = 1,
+        iou_thresh = 0.8,
+        force_baking_ele_list = None,
+        save_memory = False
+    ):
+        self.device = device
+        self.save_memory = save_memory
+        self.align_model = AsymmetricCroCo3DStereo.from_pretrained(align_model)
+        self.align_model = self.align_model if save_memory else self.align_model.to(device)
+        self.align_times = align_times
+        self.align_model.eval()
+        self.iou_thresh = iou_thresh
+        self.force_baking_ele_list = [] if force_baking_ele_list is None else force_baking_ele_list
+        self.force_baking_ele_list = [int(_) for _ in self.force_baking_ele_list]
+        set_parameter_grad_false(self.align_model)
+        print('baking align model', get_parameter_number(self.align_model))
+    def align_and_check(self, src, dst, align_times=3):
+        try:
+            st = time.time()
+            best_baking_flag = False
+            best_aligned_image = aligned_image = src
+            best_info = {'match_num': 1000, "mask_iou": self.iou_thresh-0.1}
+            for i in range(align_times):
+                aligned_image, info = infer_warp_mesh_img(aligned_image, dst, self.align_model, vis=False)
+                aligned_image = Image.fromarray(aligned_image)
+                print(f"{i}-th time align process, mask-iou is {info['mask_iou']}")
+                if info['mask_iou'] > best_info['mask_iou']:
+                    best_aligned_image, best_info = aligned_image, info
+                if info['mask_iou'] < self.iou_thresh:
+                    break
+            print(f"Best Baking Info:{best_info['mask_iou']}")
+            best_baking_flag = best_info['mask_iou'] > self.iou_thresh
+            return best_aligned_image, best_info, best_baking_flag
+        except Exception as e:
+            print(f"Error processing image: {e}")
+            traceback.print_exc()
+            return None, None, None
+    @timing_decorator("baking mesh")
+    def __call__(self, *args, **kwargs):
+        if self.save_memory:
+            self.align_model = self.align_model.to(self.device)
+            torch.cuda.empty_cache()
+            res = self.call(*args, **kwargs)
+            self.align_model = self.align_model.to("cpu")
+        else:
+            res = self.call(*args, **kwargs)
+        torch.cuda.empty_cache()
+        return res
+    def call(self, save_folder):
+        obj_path         = os.path.join(save_folder, "mesh.obj")
+        raw_texture_path = os.path.join(save_folder, "texture.png")
+        views_pil        = os.path.join(save_folder, "views.jpg")
+        views_gif        = os.path.join(save_folder, "views.gif")
+        cond_pil         = os.path.join(save_folder, "img_nobg.png")
+        if os.path.exists(views_pil):
+            views_pil = Image.open(views_pil)
+            views = rearrange(np.asarray(views_pil, dtype=np.uint8), '(n h) (m w) c -> (n m) h w c', n=3, m=2)
+            views = [Image.fromarray(views[idx]).convert('RGB') for idx in [0,2,4,5,3,1]]
+            cond_pil = Image.open(cond_pil).resize((512,512))
+        elif os.path.exists(views_gif):
+            views_gif_pil = Image.open(views_gif)
+            views = [img.convert('RGB') for img in ImageSequence.Iterator(views_gif_pil)]
+            cond_pil, views = views[0], views[1:]
+        else:
+            raise FileNotFoundError("views file not found")
+        rendered_views = render_func(obj_path, elev=0, n_views=2)
+        for ele_idx, ele in enumerate([0, 180]):
+            if ele == 0:
+                aligned_cond, cond_info, _ = self.align_and_check(cond_pil, rendered_views[0], align_times=self.align_times)
+                aligned_cond.save(save_folder + f'/aligned_cond.jpg')
+                aligned_img, info, _ = self.align_and_check(views[0], rendered_views[0], align_times=self.align_times)
+                aligned_img.save(save_folder + f'/aligned_{ele}.jpg')
+                if info['mask_iou'] < cond_info['mask_iou']:
+                    print("Using Cond Image to bake front view")
+                    aligned_img = aligned_cond
+                    info = cond_info
+                need_baking = info['mask_iou'] > self.iou_thresh
+            else:
+                aligned_img, info, need_baking = self.align_and_check(views[ele//60], rendered_views[ele_idx])
+                aligned_img.save(save_folder + f'/aligned_{ele}.jpg')
+            if need_baking or (ele in self.force_baking_ele_list):
+                st = time.time()
+                view1_res = back_projection(
+                    obj_file = obj_path,
+                    init_texture_file = raw_texture_path,
+                    front_view_file = aligned_img,
+                    dst_dir = os.path.join(save_folder, f"view_{ele_idx}"),
+                    render_resolution = aligned_img.size[0],
+                    uv_resolution = 1024,
+                    views = [[0, ele]],
+                    device = self.device
+                )
+                print(f"view_{ele_idx} elevation_{ele} baking finished at {time.time() - st}")
+                obj_path = os.path.join(save_folder, f"view_{ele_idx}/bake/mesh.obj")
+                raw_texture_path = os.path.join(save_folder, f"view_{ele_idx}/bake/texture.png")
+            else:
+                print(f"Skip view_{ele_idx} elevation_{ele} baking")
+        print("Baking Finished")
+        return obj_path
+if __name__ == "__main__":
+    baker = MeshBaker()
+    obj_path = baker("./outputs/test")
+    print(obj_path)

third_party/utils/camera_utils.py ADDED Viewed

	@@ -0,0 +1,90 @@

+import math
+import numpy as np
+def compute_extrinsic_matrix(elevation, azimuth, camera_distance):
+    # Convert angles to radians
+    elevation_rad = np.radians(elevation)
+    azimuth_rad = np.radians(azimuth)
+    R = np.array([
+        [np.cos(azimuth_rad), 0, -np.sin(azimuth_rad)],
+        [0, 1, 0],
+        [np.sin(azimuth_rad), 0, np.cos(azimuth_rad)],
+    ], dtype=np.float32)
+    R = R @ np.array([
+        [1, 0, 0],
+        [0, np.cos(elevation_rad), -np.sin(elevation_rad)],
+        [0, np.sin(elevation_rad), np.cos(elevation_rad)]
+    ], dtype=np.float32)
+    # Construct translation matrix T (3x1)
+    T = np.array([[camera_distance], [0], [0]], dtype=np.float32)
+    T = R @ T
+    # Combined into a 4x4 transformation matrix
+    extrinsic_matrix = np.vstack((np.hstack((R, T)), np.array([[0, 0, 0, 1]], dtype=np.float32)))
+    return extrinsic_matrix
+def transform_camera_pose(im_pose, ori_pose, new_pose):
+    T = new_pose @ ori_pose.T
+    transformed_poses = []
+    for pose in im_pose:
+        transformed_pose = T @ pose
+        transformed_poses.append(transformed_pose)
+    return transformed_poses
+def compute_fov(intrinsic_matrix):
+    # Get the focal length value in the internal parameter matrix
+    fx = intrinsic_matrix[0, 0]
+    fy = intrinsic_matrix[1, 1]
+    h, w = intrinsic_matrix[0,2]*2, intrinsic_matrix[1,2]*2
+    # Calculate horizontal and vertical FOV values
+    fov_x = 2 * math.atan(w / (2 * fx)) * 180 / math.pi
+    fov_y = 2 * math.atan(h / (2 * fy)) * 180 / math.pi
+    return fov_x, fov_y
+def rotation_matrix_to_quaternion(rotation_matrix):
+    rot = Rotation.from_matrix(rotation_matrix)
+    quaternion = rot.as_quat()
+    return quaternion
+def quaternion_to_rotation_matrix(quaternion):
+    rot = Rotation.from_quat(quaternion)
+    rotation_matrix = rot.as_matrix()
+    return rotation_matrix
+def remap_points(img_size, match, size=512):
+    H, W, _ = img_size
+    S = max(W, H)
+    new_W = int(round(W * size / S))
+    new_H = int(round(H * size / S))
+    cx, cy = new_W // 2, new_H // 2
+    # Calculate the coordinates of the transformed image center point
+    halfw, halfh = ((2 * cx) // 16) * 8, ((2 * cy) // 16) * 8
+    dw, dh = cx - halfw, cy - halfh
+    # store point coordinates mapped back to the original image
+    new_match = np.zeros_like(match)
+    # Map the transformed point coordinates back to the original image
+    new_match[:, 0] = (match[:, 0] + dw) / new_W * W
+    new_match[:, 1] = (match[:, 1] + dh) / new_H * H
+    #print(dw,new_W,W,dh,new_H,H)
+    return new_match

third_party/utils/img_utils.py ADDED Viewed

	@@ -0,0 +1,211 @@

+import os
+import cv2
+import numpy as np
+from skimage.metrics import hausdorff_distance
+from matplotlib import pyplot as plt
+def get_input_imgs_path(input_data_dir):
+    path = {}
+    names = ['000', 'ori_000']
+    for name in names:
+        jpg_path = os.path.join(input_data_dir, f"{name}.jpg")
+        png_path = os.path.join(input_data_dir, f"{name}.png")
+        if os.path.exists(jpg_path):
+            path[name] = jpg_path
+        elif os.path.exists(png_path):
+            path[name] = png_path
+    return path
+def rgba_to_rgb(image, bg_color=[255, 255, 255]):
+    if image.shape[-1] == 3: return image
+    rgba = image.astype(float)
+    rgb = rgba[:, :, :3].copy()
+    alpha = rgba[:, :, 3] / 255.0
+    bg = np.ones((image.shape[0], image.shape[1], 3), dtype=np.float32)
+    bg = bg * np.array(bg_color, dtype=np.float32)
+    rgb = rgb * alpha[:, :, np.newaxis] + bg * (1 - alpha[:, :, np.newaxis])
+    rgb = rgb.astype(np.uint8)
+    return rgb
+def resize_with_aspect_ratio(image1, image2, pad_value=[255, 255, 255]):
+    aspect_ratio1 = float(image1.shape[1]) / float(image1.shape[0])
+    aspect_ratio2 = float(image2.shape[1]) / float(image2.shape[0])
+    top_pad, bottom_pad, left_pad, right_pad = 0, 0, 0, 0
+    if aspect_ratio1 < aspect_ratio2:
+        new_width = (aspect_ratio2 * image1.shape[0])
+        right_pad = left_pad = int((new_width - image1.shape[1]) / 2)
+    else:
+        new_height = (image1.shape[1] / aspect_ratio2)
+        bottom_pad = top_pad = int((new_height - image1.shape[0]) / 2)
+    image1_padded = cv2.copyMakeBorder(
+        image1, top_pad, bottom_pad, left_pad, right_pad, cv2.BORDER_CONSTANT, value=pad_value
+    )
+    return image1_padded
+def estimate_img_mask(image):
+    # to gray
+    gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
+    # segment
+    # _, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
+    # mask_otsu = thresh.astype(bool)
+    # thresh_gray = 240
+    edges = cv2.Canny(gray, 20, 50)
+    kernel = np.ones((3, 3), np.uint8)
+    edges_dilated = cv2.dilate(edges, kernel, iterations=1)
+    contours, _ = cv2.findContours(edges_dilated, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
+    mask = np.zeros_like(gray, dtype=np.uint8)
+    cv2.drawContours(mask, contours, -1, 255, thickness=cv2.FILLED)
+    mask = mask.astype(bool)
+    return mask
+def compute_img_diff(img1, img2, matches1, matches1_from_2, vis=False):
+    scale = 0.125
+    gray_trunc_thres = 25 / 255.0
+    # Match
+    if matches1.shape[0] > 0:
+        match_scale = np.max(np.ptp(matches1, axis=-1))
+        match_dists = np.sqrt(np.sum((matches1 - matches1_from_2) ** 2, axis=-1))
+        dist_threshold = match_scale * 0.01
+        match_num = np.sum(match_dists <= dist_threshold)
+        match_rate = np.mean(match_dists <= dist_threshold)
+    else:
+        match_num = 0
+        match_rate = 0
+    # IOU
+    img1_mask = estimate_img_mask(img1)
+    img2_mask = estimate_img_mask(img2)
+    img_intersection = (img1_mask == 1) & (img2_mask == 1)
+    img_union = (img1_mask == 1) | (img2_mask == 1)
+    intersection = np.sum(img_intersection == 1)
+    union = np.sum(img_union == 1)
+    mask_iou = intersection / union if union != 0 else 0
+    # Gray
+    height, width = img1.shape[:2]
+    img1_gray = cv2.cvtColor(img1, cv2.COLOR_BGR2GRAY)
+    img2_gray = cv2.cvtColor(img2, cv2.COLOR_BGR2GRAY)
+    img1_gray = cv2.GaussianBlur(img1_gray, (7, 7), 0)
+    img2_gray = cv2.GaussianBlur(img2_gray, (7, 7), 0)
+    # Gray Diff
+    img1_gray_small = cv2.resize(img1_gray, (int(width * scale), int(height * scale)),
+                                 interpolation=cv2.INTER_LINEAR) / 255.0
+    img2_gray_small = cv2.resize(img2_gray, (int(width * scale), int(height * scale)),
+                                 interpolation=cv2.INTER_LINEAR) / 255.0
+    img_gray_small_diff = np.abs(img1_gray_small - img2_gray_small)
+    gray_diff = img_gray_small_diff.sum() / (union * scale) if union != 0 else 1
+    img_gray_small_diff_trunc = img_gray_small_diff.copy()
+    img_gray_small_diff_trunc[img_gray_small_diff < gray_trunc_thres] = 0
+    gray_diff_trunc = img_gray_small_diff_trunc.sum() / (union * scale) if union != 0 else 1
+    # Edge
+    img1_edge = cv2.Canny(img1_gray, 100, 200)
+    img2_edge = cv2.Canny(img2_gray, 100, 200)
+    bw_edges1 = (img1_edge > 0).astype(bool)
+    bw_edges2 = (img2_edge > 0).astype(bool)
+    hausdorff_dist = hausdorff_distance(bw_edges1, bw_edges2)
+    if vis == True:
+        fig, axs = plt.subplots(1, 4, figsize=(15, 5))
+        axs[0].imshow(img1_gray, cmap='gray')
+        axs[0].set_title('Img1')
+        axs[1].imshow(img2_gray, cmap='gray')
+        axs[1].set_title('Img2')
+        axs[2].imshow(img1_mask)
+        axs[2].set_title('Mask1')
+        axs[3].imshow(img2_mask)
+        axs[3].set_title('Mask2')
+        plt.show()
+        plt.figure()
+        mask_cmp = np.zeros((height, width, 3))
+        mask_cmp[img_intersection, 1] = 1
+        mask_cmp[img_union, 0] = 1
+        plt.imshow(mask_cmp)
+        plt.show()
+        fig, axs = plt.subplots(1, 4, figsize=(15, 5))
+        axs[0].imshow(img1_gray_small, cmap='gray')
+        axs[0].set_title('Img1 Gray')
+        axs[1].imshow(img2_gray_small, cmap='gray')
+        axs[1].set_title('Img2 Gary')
+        axs[2].imshow(img_gray_small_diff, cmap='gray')
+        axs[2].set_title('diff')
+        axs[3].imshow(img_gray_small_diff_trunc, cmap='gray')
+        axs[3].set_title('diff_trunct')
+        plt.show()
+        fig, axs = plt.subplots(1, 2, figsize=(15, 5))
+        axs[0].imshow(img1_edge, cmap='gray')
+        axs[0].set_title('img1_edge')
+        axs[1].imshow(img2_edge, cmap='gray')
+        axs[1].set_title('img2_edge')
+        plt.show()
+    info = {}
+    info['match_num'] = match_num
+    info['match_rate'] = match_rate
+    info['mask_iou'] = mask_iou
+    info['gray_diff'] = gray_diff
+    info['gray_diff_trunc'] = gray_diff_trunc
+    info['hausdorff_dist'] = hausdorff_dist
+    return info
+def predict_match_success_human(info):
+    match_num = info['match_num']
+    match_rate = info['match_rate']
+    mask_iou = info['mask_iou']
+    gray_diff = info['gray_diff']
+    gray_diff_trunc = info['gray_diff_trunc']
+    hausdorff_dist = info['hausdorff_dist']
+    if mask_iou > 0.95:
+        return True
+    if match_num < 20 or match_rate < 0.7:
+        return False
+    if mask_iou > 0.80 and gray_diff < 0.040 and gray_diff_trunc < 0.010:
+        return True
+    if mask_iou > 0.70 and gray_diff < 0.050 and gray_diff_trunc < 0.008:
+        return True
+    '''
+    if match_rate<0.70 or match_num<3000:
+        return False
+    if (mask_iou>0.85 and hausdorff_dist<20)or (gray_diff<0.015 and gray_diff_trunc<0.01) or match_rate>=0.90:
+        return True
+    '''
+    return False
+def predict_match_success(info, model=None):
+    if model == None:
+        return predict_match_success_human(info)
+    else:
+        feat_name = ['match_num', 'match_rate', 'mask_iou', 'gray_diff', 'gray_diff_trunc', 'hausdorff_dist']
+        features = [info[f] for f in feat_name]
+        pred = model.predict([features])[0]
+        return pred >= 0.5