HunyuanDiT TensorRT Acceleration

Language: English | 中文

We provide a TensorRT version of HunyuanDiT for inference acceleration (faster than flash attention). One can convert the torch model to TensorRT model using the following steps based on TensorRT-10.1.0.27 and cuda (11.7 or 11.8).

⚠️ Important Reminder (Suggestion for testing the TensorRT acceleration version):
We recommend users to test the TensorRT version on NVIDIA GPUs with Compute Capability >= 8.0,(For example, RTX4090, RTX3090, H800, A10/A100/A800, etc.) you can query the Compute Capability corresponding to your GPU from here. For NVIDIA GPUs with Compute Capability < 8.0, if you want to try the TensorRT version, you may encounter errors that the TensorRT Engine file cannot be generated or the inference performance is poor, the main reason is that TensorRT does not support fused mha kernel on this architecture.

🛠 Instructions

1. Download dependencies from huggingface.

cd HunyuanDiT
# Use the huggingface-cli tool to download the model.
huggingface-cli download Tencent-Hunyuan/TensorRT-libs --local-dir ./ckpts/t2i/model_trt

2. Install the TensorRT dependencies.

# Extract and install the TensorRT dependencies.
sh trt/install.sh

# Set the TensorRT build environment variables. We provide a script to set up the environment.
source trt/activate.sh

3. Build the TensorRT engine.

Method 1: Build your own engine (Recommend)

If you are using a different GPU, you can build the engine using the following command.

Hunyuan-DiT v1.2

# Build the TensorRT engine. By default, it will read the `ckpts` folder in the current directory.
sh trt/build_engine.sh

Using Previous versions, Hunyuan-DiT <= v1.1

# v1.1 
sh trt/build_engine.sh 1.1
# v1.0 
sh trt/build_engine.sh 1.0

Finally, if you see the output like &&&& PASSED TensorRT.trtexec [TensorRT v10100], the engine is built successfully.

Method 2: Use the prebuilt engine (only for v1.x)

We provide some prebuilt TensorRT Engines, which need to be downloaded from Huggingface.

Supported GPU	Remote Path
GeForce RTX 3090	`engines/RTX3090/model_onnx.plan`
GeForce RTX 4090	`engines/RTX4090/model_onnx.plan`
A100	`engines/A100/model_onnx.plan`

Use the following command to download and place the engine in the specified location.

Note: Please replace <Remote Path> with the corresponding remote path in the table above.

export REMOTE_PATH=<Remote Path>
huggingface-cli download Tencent-Hunyuan/TensorRT-engine ${REMOTE_PATH} ./ckpts/t2i/model_trt/engine/
ln -s ${REMOTE_PATH} ./ckpts/t2i/model_trt/engine/model_onnx.plan

4. Run the inference using the TensorRT model.

# Important: If you have not activated the environment, please run the following command.
source trt/activate.sh

# Run the inference using the prompt-enhanced model + HunyuanDiT TensorRT model.
python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt

# Close prompt enhancement. (save GPU memory)
python sample_t2i.py --prompt "渔舟唱晚" --infer-mode trt --no-enhance

5. Notice

The TensorRT engine is designed to support following shapes of input for performance reasons. In the future, we will verify and try to support arbitrary shapes.

STANDARD_SHAPE = [
    [(1024, 1024), (1280, 1280)],   # 1:1
    [(1280, 960)],                  # 4:3
    [(960, 1280)],                  # 3:4
    [(1280, 768)],                  # 16:9
    [(768, 1280)],                  # 9:16
]

❓ Q&A

Please refer to the Q&A for more questions and answers about building the TensorRT Engine.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support