---
license: other
license_name: tencent-hunyuan-community
license_link: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/LICENSE.txt
language:
- en
---
# HunyuanDiT LoRA

Language: **English**

## Instructions

 The dependencies and installation are basically the same as the [**original model**](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT).

 We provide two types of trained LoRA weights for you to test.
 
 Then download the model using the following commands:

```bash
cd HunyuanDiT
# Use the huggingface-cli tool to download the model.
huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora
```

## Training

We provide three types of weights for fine-tuning HY-DiT LoRA, `ema`, `module` and `distill`, and you can choose according to the actual effect. By default, we use `ema` weights. 

Here is an example, we load the `ema` weights into the main model and perform LoRA fine-tuning through the `--ema-to-module` parameter. 

If you want to load the `module` weights into the main model, just remove the `--ema-to-module` parameter.

If multiple resolution are used, you need to add the `--multireso` and `--reso-step 64 ` parameter. 

```bash
model='DiT-g/2'                                 # model type
task_flag="lora_jade_ema_rank64"                # task flag 
resume=./ckpts/t2i/model/                       # resume checkpoint 
index_file=dataset/index_v2_json/jade.json      # index file
results_dir=./log_EXP                           # save root for results
batch_size=1                                    # training batch size  
image_size=1024                                 # training image resolution
grad_accu_steps=2                               # gradient accumulation steps 
warmup_num_steps=0                              # warm-up steps
lr=0.0001                                       # learning rate
ckpt_every=100                                  # create a ckpt every a few steps.
ckpt_latest_every=2000                          # create a ckpt named `latest.pt` every a few steps. 
rank=64                                         # rank of lora 

PYTHONPATH=./ deepspeed hydit/train_large_deepspeed.py \
    --task-flag ${task_flag} \
    --model ${model} \
    --training_parts lora \
    --rank ${rank} \
    --resume-split \
    --resume ${resume} \
    --ema-to-module \
    --lr ${lr} \
    --noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.03 \
    --predict-type v_prediction \
    --uncond-p 0.44 \
    --uncond-p-t5 0.44 \
    --index-file ${index_file} \
    --random-flip \
    --batch-size ${batch_size} \
    --image-size ${image_size} \
    --global-seed 999 \
    --grad-accu-steps ${grad_accu_steps} \
    --warmup-num-steps ${warmup_num_steps} \
    --use-flash-attn \
    --use-fp16 \
    --ema-dtype fp32 \
    --results-dir ${results_dir} \
    --ckpt-every ${ckpt_every} \
    --max-training-steps ${max_training_steps}\
    --ckpt-latest-every ${ckpt_latest_every} \
    --log-every 10 \
    --deepspeed \
    --deepspeed-optimizer \
    --use-zero-stage 2 \
    --qk-norm \
    --rope-img base512 \
    --rope-real \
    "$@"
```

Recommended parameter settings

|     Parameter     |  Description  |          Recommended Parameter Value                               | Note|
|:---------------:|:---------:|:---------------------------------------------------:|:--:|
|   `--batch_size` |    Training batch size    |        1        | Depends on GPU memory| 
|   `--grad-accu-steps` |    Size of gradient accumulation    |       2        | - |
|   `--rank` |    Rank of lora    |       64        | 8-128 are all possible|
|   `--max-training-steps` |    Training steps  |       2000        | Varies with the amount of training data, about 2000 steps are enough for 100 images|
|   `--lr` |    Learning rate  |        0.0001        | - |


## Inference

### Using Gradio

Make sure you have activated the conda environment before running the following command.

> ⚠️ Important Reminder:  
> We recommend not using prompt enhance, as it may lead to the disappearance of style words. 

```shell
# porcelain style

# By default, we start a Chinese UI.
python app/hydit_app.py  --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

# Using Flash Attention for acceleration.
python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

# You can disable the enhancement model if the GPU memory is insufficient.
# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag. 
python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt  ./ckpts/t2i/lora/jade

# Start with English UI
python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

# jade style 

# By default, we start a Chinese UI.
python app/hydit_app.py  --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

# Using Flash Attention for acceleration.
python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

# You can disable the enhancement model if the GPU memory is insufficient.
# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag. 
python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt  ./ckpts/t2i/lora/porcelain

# Start with English UI
python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
```


### Using Command Line

We provide several commands to quick start: 

```shell
# porcelain style

# Prompt Enhancement + Text-to-Image. Torch mode
python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

# Only Text-to-Image. Torch mode
python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

# Only Text-to-Image. Flash Attention mode
python sample_t2i.py --infer-mode fa --prompt "玉石绘画风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt  ./ckpts/t2i/lora/jade

# Generate an image with other image sizes.
python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

# jade style 

# Prompt Enhancement + Text-to-Image. Torch mode
python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

# Only Text-to-Image. Torch mode
python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶"  --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

# Only Text-to-Image. Flash Attention mode
python sample_t2i.py --infer-mode fa --prompt "青花瓷风格，一只猫在追蝴蝶"  --load-key ema --lora_ckpt  ./ckpts/t2i/lora/porcelain

# Generate an image with other image sizes.
python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶"  --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
```

More example prompts can be found in [example_prompts.txt](example_prompts.txt)