Shitao commited on
Commit
f797c35
•
1 Parent(s): 4636aeb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -32
README.md CHANGED
@@ -4,12 +4,11 @@ pipeline_tag: text-to-image
4
  tags:
5
  - image-to-image
6
  ---
7
-
8
  <h1 align="center">OmniGen: Unified Image Generation</h1>
9
 
10
 
11
  <p align="center">
12
- <a href="">
13
  <img alt="Build" src="https://img.shields.io/badge/Project%20Page-OmniGen-yellow">
14
  </a>
15
  <a href="https://arxiv.org/abs/2409.11340">
@@ -20,12 +19,15 @@ tags:
20
  </a>
21
  <a href="https://huggingface.co/Shitao/OmniGen-v1">
22
  <img alt="Build" src="https://img.shields.io/badge/HF%20Model-🤗-yellow">
 
 
 
23
  </a>
24
  </p>
25
 
26
  <h4 align="center">
27
  <p>
28
- <a href=#2-news>News</a> |
29
  <a href=#3-methodology>Methodology</a> |
30
  <a href=#4-what-can-omnigen-do>Capabilities</a> |
31
  <a href=#5-quick-start>Quick Start</a> |
@@ -35,23 +37,24 @@ tags:
35
  <p>
36
  </h4>
37
 
38
- More information please refer to our github repo: https://github.com/VectorSpaceLab/OmniGen
39
 
40
- ## 1. Overview
41
 
42
- OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible and easy to use. We provide [inference code](#5-quick-start) so that everyone can explore more functionalities of OmniGen.
 
 
 
 
43
 
44
- Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, **we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.**
45
 
46
- Due to the limited resources, OmniGen still has room for improvement. We will continue to optimize it, and hope it inspire more universal image generation models. You can also easily fine-tune OmniGen without worrying about designing networks for specific tasks; you just need to prepare the corresponding data, and then run the [script](#6-finetune). Imagination is no longer limited; everyone can construct any image generation task, and perhaps we can achieve very interesting, wonderful and creative things.
47
 
48
- If you have any questions, ideas or interesting tasks you want OmniGen to accomplish, feel free to discuss with us: 2906698981@qq.com, wangyueze@tju.edu.cn, zhengliu1026@gmail.com. We welcome any feedback to help us improve the model.
49
 
 
50
 
 
51
 
52
- ## 2. News
53
- - 2024-10-22: :fire: We release the code for OmniGen. Inference: [docs/inference.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md) Train: [docs/fine-tuning.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/fine-tuning.md)
54
- - 2024-10-22: :fire: We release the first version of OmniGen. Model Weight: [Shitao/OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1) HF Demo: [🤗](https://huggingface.co/spaces/Shitao/OmniGen)
55
 
56
 
57
 
@@ -60,11 +63,17 @@ If you have any questions, ideas or interesting tasks you want OmniGen to accomp
60
  You can see details in our [paper](https://arxiv.org/abs/2409.11340).
61
 
62
 
 
63
  ## 4. What Can OmniGen do?
64
- ![demo](./demo_cases.png)
65
 
66
- OmniGen is a unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. **OmniGen don't need additional plugins or operations, it can automatically identify the features (e.g., required object, human pose, depth mapping) in input images according the text prompt.**
67
- We showcase some examples in [inference.ipynb](https://github.com/VectorSpaceLab/OmniGen/blob/main/inference.ipynb). And in [inference_demo.ipynb](https://github.com/VectorSpaceLab/OmniGen/blob/main/inference_demo.ipynb), we show a insteresting pipeline to generate and modify a image.
 
 
 
 
 
 
68
 
69
  If you are not entirely satisfied with certain functionalities or wish to add new capabilities, you can try [fine-tuning OmniGen](#6-finetune).
70
 
@@ -74,24 +83,36 @@ If you are not entirely satisfied with certain functionalities or wish to add ne
74
 
75
 
76
  ### Using OmniGen
77
- Install via Github(Recommend):
78
  ```bash
79
  git clone https://github.com/staoxiao/OmniGen.git
80
  cd OmniGen
81
  pip install -e .
82
  ```
83
- or via pypi:
84
- ```bash
85
- pip install OmniGen
 
 
 
 
 
 
 
 
 
 
86
  ```
87
 
88
  Here are some examples:
89
  ```python
90
  from OmniGen import OmniGenPipeline
91
 
92
- pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
 
93
 
94
- # Text to Image
 
95
  images = pipe(
96
  prompt="A curly-haired man in a red shirt is drinking tea.",
97
  height=1024,
@@ -101,25 +122,27 @@ images = pipe(
101
  )
102
  images[0].save("example_t2i.png") # save output PIL Image
103
 
104
- # Multi-modal to Image
105
- # In prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
106
  # You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
107
  images = pipe(
108
- prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>."
109
- input_images=["./imgs/test_cases/two_man.jpg"]
110
  height=1024,
111
  width=1024,
112
- separate_cfg_infer=False, # if OOM, you can set separate_cfg_infer=True
113
- guidance_scale=3,
114
- img_guidance_scale=1.6
115
  )
116
  images[0].save("example_ti2i.png") # save output PIL image
117
  ```
118
- For more details about the argument in inference, please refer to [docs/inference.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/inference.md).
119
- For more examples for image generation, you can refer to [inference.ipynb](https://github.com/VectorSpaceLab/OmniGen/blob/main/inference.ipynb) and [inference_demo.ipynb](https://github.com/VectorSpaceLab/OmniGen/blob/main/inference_demo.ipynb)
 
120
 
121
 
122
  ### Using Diffusers
 
123
  Coming soon.
124
 
125
 
@@ -127,12 +150,22 @@ Coming soon.
127
 
128
  We construct an online demo in [Huggingface](https://huggingface.co/spaces/Shitao/OmniGen).
129
 
130
- For the local gradio demo, you can run:
131
  ```python
 
132
  python app.py
133
  ```
134
 
 
 
135
 
 
 
 
 
 
 
 
136
 
137
  ## 6. Finetune
138
  We provide a training script `train.py` to fine-tune OmniGen.
@@ -157,9 +190,14 @@ accelerate launch --num_processes=1 train.py \
157
  --results_dir ./results/toy_finetune_lora
158
  ```
159
 
160
- Please refer to [docs/finetune.md](https://github.com/VectorSpaceLab/OmniGen/blob/main/docs/fine-tune.md) for more details (e.g. full finetune).
161
 
 
 
162
 
 
 
 
163
 
164
  ## License
165
  This repo is licensed under the [MIT License](LICENSE).
 
4
  tags:
5
  - image-to-image
6
  ---
 
7
  <h1 align="center">OmniGen: Unified Image Generation</h1>
8
 
9
 
10
  <p align="center">
11
+ <a href="https://vectorspacelab.github.io/OmniGen/">
12
  <img alt="Build" src="https://img.shields.io/badge/Project%20Page-OmniGen-yellow">
13
  </a>
14
  <a href="https://arxiv.org/abs/2409.11340">
 
19
  </a>
20
  <a href="https://huggingface.co/Shitao/OmniGen-v1">
21
  <img alt="Build" src="https://img.shields.io/badge/HF%20Model-🤗-yellow">
22
+ </a>
23
+ <a href="https://replicate.com/chenxwh/omnigen">
24
+ <img alt="Build" src="https://replicate.com/chenxwh/omnigen/badge">
25
  </a>
26
  </p>
27
 
28
  <h4 align="center">
29
  <p>
30
+ <a href=#1-news>News</a> |
31
  <a href=#3-methodology>Methodology</a> |
32
  <a href=#4-what-can-omnigen-do>Capabilities</a> |
33
  <a href=#5-quick-start>Quick Start</a> |
 
37
  <p>
38
  </h4>
39
 
 
40
 
 
41
 
42
+ ## 1. News
43
+ - 2024-11-03: Added Replicate Demo and API: [![Replicate](https://replicate.com/chenxwh/omnigen/badge)](https://replicate.com/chenxwh/omnigen)
44
+ - 2024-10-28: We release new version of inference code, optimizing the memory usage and time cost. You can refer to [docs/inference.md](docs/inference.md#requiremented-resources) for detailed information.
45
+ - 2024-10-22: :fire: We release the code for OmniGen. Inference: [docs/inference.md](docs/inference.md) Train: [docs/fine-tuning.md](docs/fine-tuning.md)
46
+ - 2024-10-22: :fire: We release the first version of OmniGen. Model Weight: [Shitao/OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1) HF Demo: [🤗](https://huggingface.co/spaces/Shitao/OmniGen)
47
 
 
48
 
49
+ ## 2. Overview
50
 
51
+ OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide [inference code](#5-quick-start) so that everyone can explore more functionalities of OmniGen.
52
 
53
+ Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, **we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.**
54
 
55
+ Due to the limited resources, OmniGen still has room for improvement. We will continue to optimize it, and hope it inspires more universal image-generation models. You can also easily fine-tune OmniGen without worrying about designing networks for specific tasks; you just need to prepare the corresponding data, and then run the [script](#6-finetune). Imagination is no longer limited; everyone can construct any image-generation task, and perhaps we can achieve very interesting, wonderful, and creative things.
56
 
57
+ If you have any questions, ideas, or interesting tasks you want OmniGen to accomplish, feel free to discuss with us: 2906698981@qq.com, wangyueze@tju.edu.cn, zhengliu1026@gmail.com. We welcome any feedback to help us improve the model.
 
 
58
 
59
 
60
 
 
63
  You can see details in our [paper](https://arxiv.org/abs/2409.11340).
64
 
65
 
66
+
67
  ## 4. What Can OmniGen do?
 
68
 
69
+ OmniGen is a unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. **OmniGen doesn't need additional plugins or operations, it can automatically identify the features (e.g., required object, human pose, depth mapping) in input images according to the text prompt.**
70
+ We showcase some examples in [inference.ipynb](inference.ipynb). And in [inference_demo.ipynb](inference_demo.ipynb), we show an interesting pipeline to generate and modify an image.
71
+
72
+ Here are the illustrations of OmniGen's capabilities:
73
+ - You can control the image generation flexibly via OmniGen
74
+ ![demo](./imgs/demo_cases.png)
75
+ - Referring Expression Generation: You can input multiple images and use simple, general language to refer to the objects within those images. OmniGen can automatically recognize the necessary objects in each image and generate new images based on them. No additional operations, such as image cropping or face detection, are required.
76
+ ![demo](./imgs/referring.png)
77
 
78
  If you are not entirely satisfied with certain functionalities or wish to add new capabilities, you can try [fine-tuning OmniGen](#6-finetune).
79
 
 
83
 
84
 
85
  ### Using OmniGen
86
+ Install via Github:
87
  ```bash
88
  git clone https://github.com/staoxiao/OmniGen.git
89
  cd OmniGen
90
  pip install -e .
91
  ```
92
+
93
+ You also can create a new environment to avoid conflicts:
94
+ ```
95
+ # Create a python 3.10.12 conda env (you could also use virtualenv)
96
+ conda create -n omnigen python=3.10.12
97
+ conda activate omnigen
98
+
99
+ # Install pytorch with your CUDA version, e.g.
100
+ pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118
101
+
102
+ git clone https://github.com/staoxiao/OmniGen.git
103
+ cd OmniGen
104
+ pip install -e .
105
  ```
106
 
107
  Here are some examples:
108
  ```python
109
  from OmniGen import OmniGenPipeline
110
 
111
+ pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
112
+ # Note: Your local model path is also acceptable, such as 'pipe = OmniGenPipeline.from_pretrained(your_local_model_path)', where all files in your_local_model_path should be organized as https://huggingface.co/Shitao/OmniGen-v1/tree/main
113
 
114
+
115
+ ## Text to Image
116
  images = pipe(
117
  prompt="A curly-haired man in a red shirt is drinking tea.",
118
  height=1024,
 
122
  )
123
  images[0].save("example_t2i.png") # save output PIL Image
124
 
125
+ ## Multi-modal to Image
126
+ # In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
127
  # You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
128
  images = pipe(
129
+ prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
130
+ input_images=["./imgs/test_cases/two_man.jpg"],
131
  height=1024,
132
  width=1024,
133
+ guidance_scale=2.5,
134
+ img_guidance_scale=1.6,
135
+ seed=0
136
  )
137
  images[0].save("example_ti2i.png") # save output PIL image
138
  ```
139
+ - If out of memory, you can set `offload_model=True`. If the inference time is too long when inputting multiple images, you can reduce the `max_input_image_size`. For the required resources and the method to run OmniGen efficiently, please refer to [docs/inference.md#requiremented-resources](docs/inference.md#requiremented-resources).
140
+ - For more examples of image generation, you can refer to [inference.ipynb](inference.ipynb) and [inference_demo.ipynb](inference_demo.ipynb)
141
+ - For more details about the argument in inference, please refer to [docs/inference.md](docs/inference.md).
142
 
143
 
144
  ### Using Diffusers
145
+
146
  Coming soon.
147
 
148
 
 
150
 
151
  We construct an online demo in [Huggingface](https://huggingface.co/spaces/Shitao/OmniGen).
152
 
153
+ For the local gradio demo, you need to install `pip install gradio spaces`, and then you can run:
154
  ```python
155
+ pip install gradio spaces
156
  python app.py
157
  ```
158
 
159
+ #### Use Google Colab
160
+ To use with Google Colab, please use the following command:
161
 
162
+ ```
163
+ !git clone https://github.com/staoxiao/OmniGen.git
164
+ %cd OmniGen
165
+ !pip install -e .
166
+ !pip install gradio spaces
167
+ !python app.py --share
168
+ ```
169
 
170
  ## 6. Finetune
171
  We provide a training script `train.py` to fine-tune OmniGen.
 
190
  --results_dir ./results/toy_finetune_lora
191
  ```
192
 
193
+ Please refer to [docs/fine-tuning.md](docs/fine-tuning.md) for more details (e.g. full finetune).
194
 
195
+ ### Contributors:
196
+ Thank all our contributors for their efforts and warmly welcome new members to join in!
197
 
198
+ <a href="https://github.com/VectorSpaceLab/OmniGen/graphs/contributors">
199
+ <img src="https://contrib.rocks/image?repo=VectorSpaceLab/OmniGen" />
200
+ </a>
201
 
202
  ## License
203
  This repo is licensed under the [MIT License](LICENSE).