File information

The repository contains the following file information:

Note: file information is just provided as context for you, do not add it to the model card.

Project page

The project page URL we found has the following URL:

Github README

The Github README we found contains the following content:

Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data

Zhiyuan Ma Xinyue Liang Rongyuan Wu Xiangyu Zhu Zhen Lei Lei Zhang

⏩ Updates

2025-04-01: Presentation slides are now available for download.
2025-03-27: The paper is now available on Arxiv.
2025-03-03: Gradio and HuggingFace Demos are available.
2025-02-27: TriplaneTurbo is accepted to CVPR 2025.

🌟 Features

Fast Inference 🚀: Our code excels in inference efficiency, capable of outputting textured mesh in around 1 second.
Text Comprehension 🆙: It demonstrates strong understanding capabilities for complex text prompts, ensuring accurate generation according to the input.
3D-Data-Free Training 🙅‍♂️: The entire training process doesn't rely on any 3D datasets, making it more resource-friendly and adaptable.

🤖 Start local inference in 3 minutes

If you only wish to set up the demo locally, use the following code for the inference. Otherwise, for training and evaluation, use the next section of instructions for environment setup.

python -m venv venv
source venv/bin/activate
bash setup.sh
python gradio_app.py

🛠️ Official Installation

Create a virtual environment:

conda create -n triplaneturbo python=3.10
conda activate triplaneturbo
conda install pytorch==2.2.0 torchvision==0.17.0 torchaudio==2.2.0 pytorch-cuda=12.1 -c pytorch -c nvidia

(Optional, Recommended) Install xFormers for attention acceleration:

conda install xFormers -c xFormers

(Optional, Recommended) Install ninja to speed up the compilation of CUDA extensions

pip install ninja

Install major dependencies

pip install -r requirements.txt

Install iNGP

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64:$LD_LIBRARY_PATH"
pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

If you encounter errors while installing iNGP, it is recommended to check your gcc version. Follow these steps to change the gcc version within your -cconda environment. After that, return to the project directory and reinstall iNGP and NerfAcc:

conda install -c conda-forge gxx=9.5.0
cd  $CONDA_PREFIX/lib
ln -s  /usr/lib/x86_64-linux-gnu/libcuda.so ./
cd <your project directory>

📊 Evaluation

If you only want to run the evaluation without training, follow these steps:

# Download the model from HuggingFace
huggingface-cli download --resume-download ZhiyuanthePony/TriplaneTurbo \
    --include "triplane_turbo_sd_v1.pth" \
    --local-dir ./pretrained \
    --local-dir-use-symlinks False

# Download evaluation assets
python scripts/prepare/download_eval_only.py

# Run evaluation script
bash scripts/eval/dreamfusion.sh --gpu 0,1 # You can use more GPUs (e.g. 0,1,2,3,4,5,6,7). For single GPU usage, please check the script for required modifications

Our evaluation metrics include:

CLIP Similarity Score
CLIP Recall@1

For detailed evaluation results, please refer to our paper.

If you want to evaluate your own model, use the following script:

CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
    --config <path_to_your_exp_config> \
    --export \
    system.exporter_type="multiprompt-mesh-exporter" \
    resume=<path_to_your_ckpt> \
    data.prompt_library="dreamfusion_415_prompt_library" \
    system.exporter.fmt=obj

After running the script, you will find generated OBJ files in outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-export>. Set this path as <OBJ_DIR>, and set outputs/<your_exp>/dreamfusion_415_prompt_library/save/<itXXXXX-4views> as <VIEW_DIR>. Then run:

SAVE_DIR=<VIEW_DIR>
python evaluation/mesh_visualize.py \
    <OBJ_DIR> \
    --save_dir $SAVE_DIR \
    --gpu 0,1,2,3,4,5,6,7

python evaluation/clipscore/compute.py \
    --result_dir $SAVE_DIR

The evaluation results will be displayed in your terminal once the computation is complete.

🚀 Training Options

1. Download Required Pretrained Models and Datasets

Use the provided download script to get all necessary files:

python scripts/prepare/download_full.py

This will download:

Stable Diffusion 2.1 Base
Stable Diffusion 1.5
MVDream 4-view checkpoint
RichDreamer checkpoint
Text prompt datasets (3DTopia and DALLE+Midjourney)

2. Training Options

Option 1: Train with 3DTopia Text Prompts

# Single GPU
CUDA_VISIBLE_DEVICES=0 python launch.py \
    --config configs/TriplaneTurbo_v0_acc-2.yaml \
    --train \
    data.prompt_library="3DTopia_prompt_library" \
    data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
    data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"

For multi-GPU training:

# 8 GPUs with 48GB+ memory each
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
    --config configs/TriplaneTurbo_v1_acc-2.yaml \
    --train \
    data.prompt_library="3DTopia_361k_prompt_library" \
    data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia" \
    data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_3DTopia"

Option 2: Train with DALLE+Midjourney Text Prompts

Choose the appropriate command based on your GPU configuration:

# Single GPU
CUDA_VISIBLE_DEVICES=0 python launch.py \
    --config configs/TriplaneTurbo_v0_acc-2.yaml \
    --train \
    data.prompt_library="DALLE_Midjourney_prompt_library" \
    data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
    data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"

For multi-GPU training (higher performance):

# 8 GPUs with 48GB+ memory each
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python launch.py \
    --config configs/TriplaneTurbo_v1_acc-2.yaml \
    --train \
    data.prompt_library="DALLE_Midjourney_prompt_library" \
    data.condition_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ" \
    data.guidance_processor.cache_dir=".threestudio_cache/text_embeddings_DE+MJ"

3. Configuration Notes

Memory Requirements:
- v1 configuration: Requires GPUs with 48GB+ memory
- v0 configuration: Works with GPUs that have less memory (46GB+) but with reduced performance
Acceleration Options:
- Use _acc-2.yaml configs for gradient accumulation to reduce memory usage
Advanced Options:
- For highest quality, use configs/TriplaneTurbo_v1.yaml with system.parallel_guidance=true (requires 98GB+ memory GPUs)
- To disable certain guidance components: add guidance.rd_weight=0 guidance.sd_weight=0 to the command

📜 Citation

If you find this work helpful, please consider citing our paper:

@article{ma2025progressive,
  title={Progressive Rendering Distillation: Adapting Stable Diffusion for Instant Text-to-Mesh Generation without 3D Data},
  author={Ma, Zhiyuan and Liang, Xinyue and Wu, Rongyuan and Zhu, Xiangyu and Lei, Zhen and Zhang, Lei},
  booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition},
  year={2025}
}

🙏 Acknowledgement

Our code is heavily based on the following works

ThreeStudio: A clean and extensible codebase for 3D generation via Score Distillation.
MVDream: Used as one of our multi - view teachers.
RichDreamer: Serves as another multi - view teacher for normal and depth supervision
3DTopia: Its text caption dataset is applied in our training and comparison.
DiffMC: Our solution uses its differentiable marching cube for mesh rasterization.
NeuS: We implement its SDF - based volume rendering for dual rendering in our solution

ZhiyuanthePony
/

TriplaneTurbo