Rypo
commited on
Commit
·
c0e85b3
1
Parent(s):
5965218
remove vae, trim readme
Browse files- .gitattributes +0 -1
- README.md +2 -213
- demo_cases.png +0 -3
- vae/config.json +0 -31
- vae/diffusion_pytorch_model.safetensors +0 -3
.gitattributes
CHANGED
@@ -33,7 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
-
demo_cases.png filter=lfs diff=lfs merge=lfs -text
|
37 |
assets/text_only_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
|
38 |
assets/single_img_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
|
39 |
assets/double_img_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
36 |
assets/text_only_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
|
37 |
assets/single_img_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
|
38 |
assets/double_img_1111_4bit_bf16.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -8,219 +8,8 @@ tags:
|
|
8 |
---
|
9 |
|
10 |
> [!NOTE]
|
11 |
-
> This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1).
|
12 |
|
13 |
<img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
|
14 |
<img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
|
15 |
-
<img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
|
16 |
-
|
17 |
-
Original model card:
|
18 |
-
|
19 |
-
---
|
20 |
-
|
21 |
-
<h1 align="center">OmniGen: Unified Image Generation</h1>
|
22 |
-
|
23 |
-
More information please refer to our repo: https://github.com/VectorSpaceLab/OmniGen
|
24 |
-
|
25 |
-
<p align="center">
|
26 |
-
<a href="https://vectorspacelab.github.io/OmniGen/">
|
27 |
-
<img alt="Build" src="https://img.shields.io/badge/Project%20Page-OmniGen-yellow">
|
28 |
-
</a>
|
29 |
-
<a href="https://arxiv.org/abs/2409.11340">
|
30 |
-
<img alt="Build" src="https://img.shields.io/badge/arXiv%20paper-2409.11340-b31b1b.svg">
|
31 |
-
</a>
|
32 |
-
<a href="https://huggingface.co/spaces/Shitao/OmniGen">
|
33 |
-
<img alt="License" src="https://img.shields.io/badge/HF%20Demo-🤗-lightblue">
|
34 |
-
</a>
|
35 |
-
<a href="https://huggingface.co/Shitao/OmniGen-v1">
|
36 |
-
<img alt="Build" src="https://img.shields.io/badge/HF%20Model-🤗-yellow">
|
37 |
-
</a>
|
38 |
-
<a href="https://replicate.com/chenxwh/omnigen">
|
39 |
-
<img alt="Build" src="https://replicate.com/chenxwh/omnigen/badge">
|
40 |
-
</a>
|
41 |
-
</p>
|
42 |
-
|
43 |
-
<h4 align="center">
|
44 |
-
<p>
|
45 |
-
<a href=#1-news>News</a> |
|
46 |
-
<a href=#3-methodology>Methodology</a> |
|
47 |
-
<a href=#4-what-can-omnigen-do>Capabilities</a> |
|
48 |
-
<a href=#5-quick-start>Quick Start</a> |
|
49 |
-
<a href="#6-finetune">Finetune</a> |
|
50 |
-
<a href="#license">License</a> |
|
51 |
-
<a href="#citation">Citation</a>
|
52 |
-
<p>
|
53 |
-
</h4>
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
## 1. News
|
58 |
-
- 2024-10-28: We release new version of inference code, optimizing the memory usage and time cost. You can refer to [docs/inference.md](docs/inference.md#requiremented-resources) for detailed information.
|
59 |
-
- 2024-10-22: :fire: We release the code for OmniGen. Inference: [docs/inference.md](docs/inference.md) Train: [docs/fine-tuning.md](docs/fine-tuning.md)
|
60 |
-
- 2024-10-22: :fire: We release the first version of OmniGen. Model Weight: [Shitao/OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1) HF Demo: [🤗](https://huggingface.co/spaces/Shitao/OmniGen)
|
61 |
-
|
62 |
-
|
63 |
-
## 2. Overview
|
64 |
-
|
65 |
-
OmniGen is a unified image generation model that can generate a wide range of images from multi-modal prompts. It is designed to be simple, flexible, and easy to use. We provide [inference code](#5-quick-start) so that everyone can explore more functionalities of OmniGen.
|
66 |
-
|
67 |
-
Existing image generation models often require loading several additional network modules (such as ControlNet, IP-Adapter, Reference-Net, etc.) and performing extra preprocessing steps (e.g., face detection, pose estimation, cropping, etc.) to generate a satisfactory image. However, **we believe that the future image generation paradigm should be more simple and flexible, that is, generating various images directly through arbitrarily multi-modal instructions without the need for additional plugins and operations, similar to how GPT works in language generation.**
|
68 |
-
|
69 |
-
Due to the limited resources, OmniGen still has room for improvement. We will continue to optimize it, and hope it inspires more universal image-generation models. You can also easily fine-tune OmniGen without worrying about designing networks for specific tasks; you just need to prepare the corresponding data, and then run the [script](#6-finetune). Imagination is no longer limited; everyone can construct any image-generation task, and perhaps we can achieve very interesting, wonderful, and creative things.
|
70 |
-
|
71 |
-
If you have any questions, ideas, or interesting tasks you want OmniGen to accomplish, feel free to discuss with us: 2906698981@qq.com, wangyueze@tju.edu.cn, zhengliu1026@gmail.com. We welcome any feedback to help us improve the model.
|
72 |
-
|
73 |
-
|
74 |
-
|
75 |
-
## 3. Methodology
|
76 |
-
|
77 |
-
You can see details in our [paper](https://arxiv.org/abs/2409.11340).
|
78 |
-
|
79 |
-
|
80 |
-
|
81 |
-
## 4. What Can OmniGen do?
|
82 |
-
|
83 |
-
OmniGen is a unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, image editing, and image-conditioned generation. **OmniGen doesn't need additional plugins or operations, it can automatically identify the features (e.g., required object, human pose, depth mapping) in input images according to the text prompt.**
|
84 |
-
We showcase some examples in [inference.ipynb](inference.ipynb). And in [inference_demo.ipynb](inference_demo.ipynb), we show an interesting pipeline to generate and modify an image.
|
85 |
-
|
86 |
-
You can control the image generation flexibly via OmniGen
|
87 |
-
![demo](demo_cases.png)
|
88 |
-
|
89 |
-
If you are not entirely satisfied with certain functionalities or wish to add new capabilities, you can try [fine-tuning OmniGen](#6-finetune).
|
90 |
-
|
91 |
-
|
92 |
-
|
93 |
-
## 5. Quick Start
|
94 |
-
|
95 |
-
|
96 |
-
### Using OmniGen
|
97 |
-
Install via Github:
|
98 |
-
```bash
|
99 |
-
git clone https://github.com/staoxiao/OmniGen.git
|
100 |
-
cd OmniGen
|
101 |
-
pip install -e .
|
102 |
-
```
|
103 |
-
|
104 |
-
You also can create a new environment to avoid conflicts:
|
105 |
-
```
|
106 |
-
# Create a python 3.10.12 conda env (you could also use virtualenv)
|
107 |
-
conda create -n omnigen python=3.10.12
|
108 |
-
conda activate omnigen
|
109 |
-
|
110 |
-
# Install pytorch with your CUDA version, e.g.
|
111 |
-
pip install torch==2.3.1+cu118 torchvision --extra-index-url https://download.pytorch.org/whl/cu118
|
112 |
-
|
113 |
-
git clone https://github.com/staoxiao/OmniGen.git
|
114 |
-
cd OmniGen
|
115 |
-
pip install -e .
|
116 |
-
```
|
117 |
-
|
118 |
-
Here are some examples:
|
119 |
-
```python
|
120 |
-
from OmniGen import OmniGenPipeline
|
121 |
-
|
122 |
-
pipe = OmniGenPipeline.from_pretrained("Shitao/OmniGen-v1")
|
123 |
-
# Note: Your local model path is also acceptable, such as 'pipe = OmniGenPipeline.from_pretrained(your_local_model_path)', where all files in your_local_model_path should be organized as https://huggingface.co/Shitao/OmniGen-v1/tree/main
|
124 |
-
|
125 |
-
|
126 |
-
## Text to Image
|
127 |
-
images = pipe(
|
128 |
-
prompt="A curly-haired man in a red shirt is drinking tea.",
|
129 |
-
height=1024,
|
130 |
-
width=1024,
|
131 |
-
guidance_scale=2.5,
|
132 |
-
seed=0,
|
133 |
-
)
|
134 |
-
images[0].save("example_t2i.png") # save output PIL Image
|
135 |
-
|
136 |
-
## Multi-modal to Image
|
137 |
-
# In the prompt, we use the placeholder to represent the image. The image placeholder should be in the format of <img><|image_*|></img>
|
138 |
-
# You can add multiple images in the input_images. Please ensure that each image has its placeholder. For example, for the list input_images [img1_path, img2_path], the prompt needs to have two placeholders: <img><|image_1|></img>, <img><|image_2|></img>.
|
139 |
-
images = pipe(
|
140 |
-
prompt="A man in a black shirt is reading a book. The man is the right man in <img><|image_1|></img>.",
|
141 |
-
input_images=["./imgs/test_cases/two_man.jpg"],
|
142 |
-
height=1024,
|
143 |
-
width=1024,
|
144 |
-
guidance_scale=2.5,
|
145 |
-
img_guidance_scale=1.6,
|
146 |
-
seed=0
|
147 |
-
)
|
148 |
-
images[0].save("example_ti2i.png") # save output PIL image
|
149 |
-
```
|
150 |
-
- If out of memory, you can set `offload_model=True`. If the inference time is too long when inputting multiple images, you can reduce the `max_input_image_size`. For the required resources and the method to run OmniGen efficiently, please refer to [docs/inference.md#requiremented-resources](docs/inference.md#requiremented-resources).
|
151 |
-
- For more examples of image generation, you can refer to [inference.ipynb](inference.ipynb) and [inference_demo.ipynb](inference_demo.ipynb)
|
152 |
-
- For more details about the argument in inference, please refer to [docs/inference.md](docs/inference.md).
|
153 |
-
|
154 |
-
|
155 |
-
### Using Diffusers
|
156 |
-
|
157 |
-
Coming soon.
|
158 |
-
|
159 |
-
|
160 |
-
### Gradio Demo
|
161 |
-
|
162 |
-
We construct an online demo in [Huggingface](https://huggingface.co/spaces/Shitao/OmniGen).
|
163 |
-
|
164 |
-
For the local gradio demo, you need to install `pip install gradio spaces`, and then you can run:
|
165 |
-
```python
|
166 |
-
pip install gradio spaces
|
167 |
-
python app.py
|
168 |
-
```
|
169 |
-
|
170 |
-
#### Use Google Colab
|
171 |
-
To use with Google Colab, please use the following command:
|
172 |
-
|
173 |
-
```
|
174 |
-
!git clone https://github.com/staoxiao/OmniGen.git
|
175 |
-
%cd OmniGen
|
176 |
-
!pip install -e .
|
177 |
-
!pip install gradio spaces
|
178 |
-
!python app.py --share
|
179 |
-
```
|
180 |
-
|
181 |
-
## 6. Finetune
|
182 |
-
We provide a training script `train.py` to fine-tune OmniGen.
|
183 |
-
Here is a toy example about LoRA finetune:
|
184 |
-
```bash
|
185 |
-
accelerate launch --num_processes=1 train.py \
|
186 |
-
--model_name_or_path Shitao/OmniGen-v1 \
|
187 |
-
--batch_size_per_device 2 \
|
188 |
-
--condition_dropout_prob 0.01 \
|
189 |
-
--lr 1e-3 \
|
190 |
-
--use_lora \
|
191 |
-
--lora_rank 8 \
|
192 |
-
--json_file ./toy_data/toy_subject_data.jsonl \
|
193 |
-
--image_path ./toy_data/images \
|
194 |
-
--max_input_length_limit 18000 \
|
195 |
-
--keep_raw_resolution \
|
196 |
-
--max_image_size 1024 \
|
197 |
-
--gradient_accumulation_steps 1 \
|
198 |
-
--ckpt_every 10 \
|
199 |
-
--epochs 200 \
|
200 |
-
--log_every 1 \
|
201 |
-
--results_dir ./results/toy_finetune_lora
|
202 |
-
```
|
203 |
-
|
204 |
-
Please refer to [docs/fine-tuning.md](docs/fine-tuning.md) for more details (e.g. full finetune).
|
205 |
-
|
206 |
-
### Contributors:
|
207 |
-
Thank all our contributors for their efforts and warmly welcome new members to join in!
|
208 |
-
|
209 |
-
<a href="https://github.com/VectorSpaceLab/OmniGen/graphs/contributors">
|
210 |
-
<img src="https://contrib.rocks/image?repo=VectorSpaceLab/OmniGen" />
|
211 |
-
</a>
|
212 |
-
|
213 |
-
## License
|
214 |
-
This repo is licensed under the [MIT License](LICENSE).
|
215 |
-
|
216 |
-
|
217 |
-
## Citation
|
218 |
-
If you find this repository useful, please consider giving a star ⭐ and citation
|
219 |
-
```
|
220 |
-
@article{xiao2024omnigen,
|
221 |
-
title={Omnigen: Unified image generation},
|
222 |
-
author={Xiao, Shitao and Wang, Yueze and Zhou, Junjie and Yuan, Huaying and Xing, Xingrun and Yan, Ruiran and Wang, Shuting and Huang, Tiejun and Liu, Zheng},
|
223 |
-
journal={arXiv preprint arXiv:2409.11340},
|
224 |
-
year={2024}
|
225 |
-
}
|
226 |
-
```
|
|
|
8 |
---
|
9 |
|
10 |
> [!NOTE]
|
11 |
+
> This repo contains bitsandbytes 4bit-NF4 model weights for [OmniGen-v1](https://huggingface.co/Shitao/OmniGen-v1). See the original model card for more info.
|
12 |
|
13 |
<img src="./assets/text_only_1111_4bit_bf16.png" alt="Text Only Comparison">
|
14 |
<img src="./assets/single_img_1111_4bit_bf16.png" alt="Single Image Comparison">
|
15 |
+
<img src="./assets/double_img_1111_4bit_bf16.png" alt="Double Image Comparison">
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
demo_cases.png
DELETED
Git LFS Details
|
vae/config.json
DELETED
@@ -1,31 +0,0 @@
|
|
1 |
-
{
|
2 |
-
"_class_name": "AutoencoderKL",
|
3 |
-
"_diffusers_version": "0.18.0.dev0",
|
4 |
-
"_name_or_path": ".",
|
5 |
-
"act_fn": "silu",
|
6 |
-
"block_out_channels": [
|
7 |
-
128,
|
8 |
-
256,
|
9 |
-
512,
|
10 |
-
512
|
11 |
-
],
|
12 |
-
"down_block_types": [
|
13 |
-
"DownEncoderBlock2D",
|
14 |
-
"DownEncoderBlock2D",
|
15 |
-
"DownEncoderBlock2D",
|
16 |
-
"DownEncoderBlock2D"
|
17 |
-
],
|
18 |
-
"in_channels": 3,
|
19 |
-
"latent_channels": 4,
|
20 |
-
"layers_per_block": 2,
|
21 |
-
"norm_num_groups": 32,
|
22 |
-
"out_channels": 3,
|
23 |
-
"sample_size": 1024,
|
24 |
-
"scaling_factor": 0.13025,
|
25 |
-
"up_block_types": [
|
26 |
-
"UpDecoderBlock2D",
|
27 |
-
"UpDecoderBlock2D",
|
28 |
-
"UpDecoderBlock2D",
|
29 |
-
"UpDecoderBlock2D"
|
30 |
-
]
|
31 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
vae/diffusion_pytorch_model.safetensors
DELETED
@@ -1,3 +0,0 @@
|
|
1 |
-
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:1598f3d24932bcfe6634e8b618ea1e30ab1d57f5aad13a6d2de446d2199f2341
|
3 |
-
size 334643268
|
|
|
|
|
|
|
|