File size: 6,224 Bytes
55d914b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
# Fine-tuning OmniGen

Fine-tuning Omnigen can better help you handle specific image generation tasks. For example, by fine-tuning on a person's images, you can generate multiple pictures of that person while maintaining task consistency.

A lot of previous work focused on designing new networks to facilitate specific tasks. For instance, ControlNet was proposed to handle image conditions, and IP-Adapter was constructed to maintain ID features. If you want to perform new tasks, you need to build new architectures and repeatedly debug them. Adding and adjusting extra network parameters is usually time-consuming and labor-intensive, which is not user-friendly and cost-efficient enough. However, with Omnigen, all of this becomes very simple.

By comparison, Omnigen can accept multi-modal conditional inputs and has been pre-trained on various tasks. You can fine-tune it on any task without designing specialized networks like ControlNet or IP-Adapter for a specific task.

**All you need to do is prepare the data and start training. You can break the limitations of previous models, allowing Omnigen to accomplish a variety of interesting tasks, even those that have never been done before.**


## Installation

```bash

git clone https://github.com/VectorSpaceLab/OmniGen.git

cd OmniGen

pip install -e .

```


## Full fine-tuning

### Fine-tuning command

```bash

accelerate launch \

    --num_processes=1 \

    --use_fsdp \

    --fsdp_offload_params false \

    --fsdp_sharding_strategy SHARD_GRAD_OP \

    --fsdp_auto_wrap_policy TRANSFORMER_BASED_WRAP \

    --fsdp_transformer_layer_cls_to_wrap Phi3DecoderLayer \

    --fsdp_state_dict_type FULL_STATE_DICT \

    --fsdp_forward_prefetch false \

    --fsdp_use_orig_params True \

    --fsdp_cpu_ram_efficient_loading false \

    --fsdp_sync_module_states True \

    train.py \

    --model_name_or_path Shitao/OmniGen-v1 \

    --json_file ./toy_data/toy_data.jsonl \

    --image_path ./toy_data/images \

    --batch_size_per_device 1 \

    --lr 2e-5 \

    --keep_raw_resolution \

    --max_image_size 1024 \

    --gradient_accumulation_steps 1 \

    --ckpt_every 100 \

    --epochs 100 \

    --log_every 1 \

    --results_dir ./results/toy_finetune

```

Some important arguments:
- `num_processes`: number of GPU to use for training
- `model_name_or_path`: path to the pretrained model
- `json_file`: path to the json file containing the training data, e.g., ./toy_data/toy_data.jsonl
- `image_path`: path to the image folder, e.g., ./toy_data/images

- `batch_size_per_device`: batch size per device
- `lr`: learning rate
- `keep_raw_resolution`: whether to keep the original resolution of the image, if not, all images will be resized to (max_image_size, max_image_size)
- `max_image_size`: max image size
- `gradient_accumulation_steps`: number of steps to accumulate gradients
- `ckpt_every`: number of steps to save checkpoint
- `epochs`: number of epochs
- `log_every`: number of steps to log
- `results_dir`: path to the results folder

The data format of json_file is as follows:

```

{

    "instruction": str, 

    "input_images": [str, str, ...], 
    "output_images": str

}

```

You can see a toy example in `./toy_data/toy_data.jsonl`.


If an OOM(Out of Memory) issue occurs, you can try to decrease the `batch_size_per_device` or `max_image_size`. You can also try to use LoRA instead of full fine-tuning.


### Inference

The checkpoint can be found at `{results_dir}/checkpoints/*`. You can use the following command to load saved checkpoint:
```python

from OmniGen import OmniGenPipeline



pipe = OmniGenPipeline.from_pretrained("checkpoint_path")  # e.g., ./results/toy_finetune/checkpoints/0000200

```





## LoRA fine-tuning
LoRA fine-tuning is a simple way to fine-tune OmniGen with less GPU memory. To use lora, you should add `--use_lora` and `--lora_rank` to the command.

```bash

accelerate launch \

    --num_processes=1 \

    train.py \

    --model_name_or_path Shitao/OmniGen-v1 \

    --batch_size_per_device 2 \

    --condition_dropout_prob 0.01 \

    --lr 3e-4 \

    --use_lora \

    --lora_rank 8 \

    --json_file ./toy_data/toy_data.jsonl \

    --image_path ./toy_data/images \

    --max_input_length_limit 18000 \

    --keep_raw_resolution \

    --max_image_size 1024 \

    --gradient_accumulation_steps 1 \

    --ckpt_every 100 \

    --epochs 100 \

    --log_every 1 \

    --results_dir ./results/toy_finetune_lora

```

### Inference

The checkpoint can be found at `{results_dir}/checkpoints/*`. You can use the following command to load checkpoint:
```python

from OmniGen import OmniGenPipeline



pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")

pipe.merge_lora("checkpoint_path")  # e.g., ./results/toy_finetune_lora/checkpoints/0000100

```


## A simple example

Here is an example for learning new concepts: "sks dog". We use five images of one dog from [dog-example](https://huggingface.co/datasets/diffusers/dog-example). 

The json file is `./toy_data/toy_subject_data.jsonl`, and the images have been saved in `./toy_data/images`.

```bash

accelerate launch \

    --num_processes=1 \

    train.py \

    --model_name_or_path Shitao/OmniGen-v1 \

    --batch_size_per_device 2 \

    --condition_dropout_prob 0.01 \

    --lr 1e-3 \

    --use_lora \

    --lora_rank 8 \

    --json_file ./toy_data/toy_subject_data.jsonl \

    --image_path ./toy_data/images \

    --max_input_length_limit 18000 \

    --keep_raw_resolution \

    --max_image_size 1024 \

    --gradient_accumulation_steps 1 \

    --ckpt_every 100 \

    --epochs 200 \

    --log_every 1 \

    --results_dir ./results/toy_finetune_lora

```

After training, you can use the following command to generate images:
```python

from OmniGen import OmniGenPipeline



pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")

pipe.merge_lora("checkpoint_path") # e.g., ./results/toy_finetune_lora/checkpoints/0000200



images = pipe(

    prompt="a photo of sks dog running in the snow", 

    height=1024, 

    width=1024, 

    guidance_scale=3

)

images[0].save("example_sks_dog_snow.png")

```