Spaces:
Running
Running
File size: 6,224 Bytes
55d914b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 |
# Fine-tuning OmniGen
Fine-tuning Omnigen can better help you handle specific image generation tasks. For example, by fine-tuning on a person's images, you can generate multiple pictures of that person while maintaining task consistency.
A lot of previous work focused on designing new networks to facilitate specific tasks. For instance, ControlNet was proposed to handle image conditions, and IP-Adapter was constructed to maintain ID features. If you want to perform new tasks, you need to build new architectures and repeatedly debug them. Adding and adjusting extra network parameters is usually time-consuming and labor-intensive, which is not user-friendly and cost-efficient enough. However, with Omnigen, all of this becomes very simple.
By comparison, Omnigen can accept multi-modal conditional inputs and has been pre-trained on various tasks. You can fine-tune it on any task without designing specialized networks like ControlNet or IP-Adapter for a specific task.
**All you need to do is prepare the data and start training. You can break the limitations of previous models, allowing Omnigen to accomplish a variety of interesting tasks, even those that have never been done before.**
## Installation
```bash
git clone https://github.com/VectorSpaceLab/OmniGen.git
cd OmniGen
pip install -e .
```
## Full fine-tuning
### Fine-tuning command
```bash
accelerate launch \
--num_processes=1 \
--use_fsdp \
--fsdp_offload_params false \
--fsdp_sharding_strategy SHARD_GRAD_OP \
--fsdp_auto_wrap_policy TRANSFORMER_BASED_WRAP \
--fsdp_transformer_layer_cls_to_wrap Phi3DecoderLayer \
--fsdp_state_dict_type FULL_STATE_DICT \
--fsdp_forward_prefetch false \
--fsdp_use_orig_params True \
--fsdp_cpu_ram_efficient_loading false \
--fsdp_sync_module_states True \
train.py \
--model_name_or_path Shitao/OmniGen-v1 \
--json_file ./toy_data/toy_data.jsonl \
--image_path ./toy_data/images \
--batch_size_per_device 1 \
--lr 2e-5 \
--keep_raw_resolution \
--max_image_size 1024 \
--gradient_accumulation_steps 1 \
--ckpt_every 100 \
--epochs 100 \
--log_every 1 \
--results_dir ./results/toy_finetune
```
Some important arguments:
- `num_processes`: number of GPU to use for training
- `model_name_or_path`: path to the pretrained model
- `json_file`: path to the json file containing the training data, e.g., ./toy_data/toy_data.jsonl
- `image_path`: path to the image folder, e.g., ./toy_data/images
- `batch_size_per_device`: batch size per device
- `lr`: learning rate
- `keep_raw_resolution`: whether to keep the original resolution of the image, if not, all images will be resized to (max_image_size, max_image_size)
- `max_image_size`: max image size
- `gradient_accumulation_steps`: number of steps to accumulate gradients
- `ckpt_every`: number of steps to save checkpoint
- `epochs`: number of epochs
- `log_every`: number of steps to log
- `results_dir`: path to the results folder
The data format of json_file is as follows:
```
{
"instruction": str,
"input_images": [str, str, ...],
"output_images": str
}
```
You can see a toy example in `./toy_data/toy_data.jsonl`.
If an OOM(Out of Memory) issue occurs, you can try to decrease the `batch_size_per_device` or `max_image_size`. You can also try to use LoRA instead of full fine-tuning.
### Inference
The checkpoint can be found at `{results_dir}/checkpoints/*`. You can use the following command to load saved checkpoint:
```python
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("checkpoint_path") # e.g., ./results/toy_finetune/checkpoints/0000200
```
## LoRA fine-tuning
LoRA fine-tuning is a simple way to fine-tune OmniGen with less GPU memory. To use lora, you should add `--use_lora` and `--lora_rank` to the command.
```bash
accelerate launch \
--num_processes=1 \
train.py \
--model_name_or_path Shitao/OmniGen-v1 \
--batch_size_per_device 2 \
--condition_dropout_prob 0.01 \
--lr 3e-4 \
--use_lora \
--lora_rank 8 \
--json_file ./toy_data/toy_data.jsonl \
--image_path ./toy_data/images \
--max_input_length_limit 18000 \
--keep_raw_resolution \
--max_image_size 1024 \
--gradient_accumulation_steps 1 \
--ckpt_every 100 \
--epochs 100 \
--log_every 1 \
--results_dir ./results/toy_finetune_lora
```
### Inference
The checkpoint can be found at `{results_dir}/checkpoints/*`. You can use the following command to load checkpoint:
```python
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
pipe.merge_lora("checkpoint_path") # e.g., ./results/toy_finetune_lora/checkpoints/0000100
```
## A simple example
Here is an example for learning new concepts: "sks dog". We use five images of one dog from [dog-example](https://huggingface.co/datasets/diffusers/dog-example).
The json file is `./toy_data/toy_subject_data.jsonl`, and the images have been saved in `./toy_data/images`.
```bash
accelerate launch \
--num_processes=1 \
train.py \
--model_name_or_path Shitao/OmniGen-v1 \
--batch_size_per_device 2 \
--condition_dropout_prob 0.01 \
--lr 1e-3 \
--use_lora \
--lora_rank 8 \
--json_file ./toy_data/toy_subject_data.jsonl \
--image_path ./toy_data/images \
--max_input_length_limit 18000 \
--keep_raw_resolution \
--max_image_size 1024 \
--gradient_accumulation_steps 1 \
--ckpt_every 100 \
--epochs 200 \
--log_every 1 \
--results_dir ./results/toy_finetune_lora
```
After training, you can use the following command to generate images:
```python
from OmniGen import OmniGenPipeline
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
pipe.merge_lora("checkpoint_path") # e.g., ./results/toy_finetune_lora/checkpoints/0000200
images = pipe(
prompt="a photo of sks dog running in the snow",
height=1024,
width=1024,
guidance_scale=3
)
images[0].save("example_sks_dog_snow.png")
```
|