HYDiT-LoRA / README.md

Update README.md

35dea3e verified 5 months ago

7.05 kB

	---
	license: other
	license_name: tencent-hunyuan-community
	license_link: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT/blob/main/LICENSE.txt
	language:
	- en
	---
	# HunyuanDiT LoRA

	Language: English

	## Instructions

	The dependencies and installation are basically the same as the [original model](https://huggingface.co/Tencent-Hunyuan/HunyuanDiT).

	We provide two types of trained LoRA weights for you to test.

	Then download the model using the following commands:

	```bash
	cd HunyuanDiT
	# Use the huggingface-cli tool to download the model.
	huggingface-cli download Tencent-Hunyuan/HYDiT-LoRA --local-dir ./ckpts/t2i/lora
	```

	## Training

	We provide three types of weights for fine-tuning HY-DiT LoRA, `ema`, `module` and `distill`, and you can choose according to the actual effect. By default, we use `ema` weights.

	Here is an example, we load the `ema` weights into the main model and perform LoRA fine-tuning through the `--ema-to-module` parameter.

	If you want to load the `module` weights into the main model, just remove the `--ema-to-module` parameter.

	If multiple resolution are used, you need to add the `--multireso` and `--reso-step 64 ` parameter.

	```bash
	model='DiT-g/2' # model type
	task_flag="lora_jade_ema_rank64" # task flag
	resume=./ckpts/t2i/model/ # resume checkpoint
	index_file=dataset/index_v2_json/jade.json # index file
	results_dir=./log_EXP # save root for results
	batch_size=1 # training batch size
	image_size=1024 # training image resolution
	grad_accu_steps=2 # gradient accumulation steps
	warmup_num_steps=0 # warm-up steps
	lr=0.0001 # learning rate
	ckpt_every=100 # create a ckpt every a few steps.
	ckpt_latest_every=2000 # create a ckpt named `latest.pt` every a few steps.
	rank=64 # rank of lora

	PYTHONPATH=./ deepspeed hydit/train_large_deepspeed.py \
	--task-flag ${task_flag} \
	--model ${model} \
	--training_parts lora \
	--rank ${rank} \
	--resume-split \
	--resume ${resume} \
	--ema-to-module \
	--lr ${lr} \
	--noise-schedule scaled_linear --beta-start 0.00085 --beta-end 0.03 \
	--predict-type v_prediction \
	--uncond-p 0.44 \
	--uncond-p-t5 0.44 \
	--index-file ${index_file} \
	--random-flip \
	--batch-size ${batch_size} \
	--image-size ${image_size} \
	--global-seed 999 \
	--grad-accu-steps ${grad_accu_steps} \
	--warmup-num-steps ${warmup_num_steps} \
	--use-flash-attn \
	--use-fp16 \
	--ema-dtype fp32 \
	--results-dir ${results_dir} \
	--ckpt-every ${ckpt_every} \
	--max-training-steps ${max_training_steps}\
	--ckpt-latest-every ${ckpt_latest_every} \
	--log-every 10 \
	--deepspeed \
	--deepspeed-optimizer \
	--use-zero-stage 2 \
	--qk-norm \
	--rope-img base512 \
	--rope-real \
	"$@"
	```

	Recommended parameter settings

	\| Parameter \| Description \| Recommended Parameter Value \| Note\|
	\|:---------------:\|:---------:\|:---------------------------------------------------:\|:--:\|
	\| `--batch_size` \| Training batch size \| 1 \| Depends on GPU memory\|
	\| `--grad-accu-steps` \| Size of gradient accumulation \| 2 \| - \|
	\| `--rank` \| Rank of lora \| 64 \| 8-128 are all possible\|
	\| `--max-training-steps` \| Training steps \| 2000 \| Varies with the amount of training data, about 2000 steps are enough for 100 images\|
	\| `--lr` \| Learning rate \| 0.0001 \| - \|



	## Inference

	### Using Gradio

	Make sure you have activated the conda environment before running the following command.

	> ⚠️ Important Reminder:
	> We recommend not using prompt enhance, as it may lead to the disappearance of style words.

	```shell
	# porcelain style

	# By default, we start a Chinese UI.
	python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

	# Using Flash Attention for acceleration.
	python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

	# You can disable the enhancement model if the GPU memory is insufficient.
	# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
	python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

	# Start with English UI
	python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

	# jade style

	# By default, we start a Chinese UI.
	python app/hydit_app.py --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

	# Using Flash Attention for acceleration.
	python app/hydit_app.py --infer-mode fa --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

	# You can disable the enhancement model if the GPU memory is insufficient.
	# The enhancement will be unavailable until you restart the app without the `--no-enhance` flag.
	python app/hydit_app.py --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

	# Start with English UI
	python app/hydit_app.py --lang en --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
	```


	### Using Command Line

	We provide several commands to quick start:

	```shell
	# porcelain style

	# Prompt Enhancement + Text-to-Image. Torch mode
	python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

	# Only Text-to-Image. Torch mode
	python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

	# Only Text-to-Image. Flash Attention mode
	python sample_t2i.py --infer-mode fa --prompt "玉石绘画风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

	# Generate an image with other image sizes.
	python sample_t2i.py --prompt "玉石绘画风格，一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/jade

	# jade style

	# Prompt Enhancement + Text-to-Image. Torch mode
	python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

	# Only Text-to-Image. Torch mode
	python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶" --no-enhance --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

	# Only Text-to-Image. Flash Attention mode
	python sample_t2i.py --infer-mode fa --prompt "青花瓷风格，一只猫在追蝴蝶" --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain

	# Generate an image with other image sizes.
	python sample_t2i.py --prompt "青花瓷风格，一只猫在追蝴蝶" --image-size 1280 768 --load-key ema --lora_ckpt ./ckpts/t2i/lora/porcelain
	```

	More example prompts can be found in [example_prompts.txt](example_prompts.txt)