Image-to-Image
Diffusers
English
File size: 5,749 Bytes
84f8bf8
 
04f6fe0
 
 
 
84f8bf8
04f6fe0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---

license: apache-2.0
language:
- en
library_name: diffusers
pipeline_tag: image-to-image
---


# InstantIR Model Card

<!-- > **InstantIR: Blind Image Restoration with Instant Generative Reference**<br>
> Jen-Yuan Huang<sup>1,2</sup>, Haofan Wang<sup>2</sup>, Qixun Wang<sup>2</sup>, Xu Bai<sup>2</sup>, Hao Ai<sup>2</sup>, Peng Xing<sup>2</sup>, Jen-Tse Huang<sup>3</sup> <br>
> <sup>1</sup>Peking University, <sup>2</sup>InstantX Team, <sup>3</sup>The Chinese University of Hong Kong -->

<a href='https://arxiv.org/abs/2410.06551'><img src='https://img.shields.io/badge/arXiv-b31b1b.svg'>
<a href='https://jy-joy.github.io/InstantIR'><img src='https://img.shields.io/badge/Website-informational'></a>
<a href='https://github.com/JY-Joy/InstantIR'><img src='https://img.shields.io/badge/Github-gray'></a>

> **InstantIR** is a novel single-image restoration model designed to resurrect your damaged images, delivering extrem-quality yet realistic details. You can further boost **InstantIR** performance with additional text prompts, even achieve customized editing!

<div  align="center">
<img src='assets/teaser_figure.png'>
</div>


## Usage

### 1. Clone the github repo
```sh

git clone https://github.com/JY-Joy/InstantIR.git

cd InstantIR

```

### 2. Download model weights
You can directly download InstantIR weights in this repository, or
you can download them using python script:

```python

from huggingface_hub import hf_hub_download

hf_hub_download(repo_id="InstantX/InstantIR", filename="models/adapter.pt", local_dir="./models")

hf_hub_download(repo_id="InstantX/InstantIR", filename="models/aggregator.pt", local_dir="./models")

hf_hub_download(repo_id="InstantX/InstantIR", filename="models/previewer_lora_weights.bin", local_dir="./models")

```

### 3. Load InstantIR with 🧨 diffusers

```python

# !pip install opencv-python transformers accelerate

import torch

from PIL import Image



import diffusers

from diffusers import DDPMScheduler, StableDiffusionXLPipeline

from diffusers.utils import load_image

from schedulers.lcm_single_step_scheduler import LCMSingleStepScheduler



from transformers import AutoImageProcessor, AutoModel



from module.ip_adapter.utils import load_ip_adapter_to_pipe, revise_state_dict, init_ip_adapter_in_unet

from module.ip_adapter.resampler import Resampler

from module.aggregator import Aggregator

from pipelines.sdxl_instantir import InstantIRPipeline



# prepare 'dinov2'

image_encoder = AutoModel.from_pretrained('facebook/dinov2-large')

image_processor = AutoImageProcessor.from_pretrained('facebook/dinov2-large')



# prepare models under ./checkpoints

dcp_adapter = f'./models/adapter.pt'

previewer_lora_path = f'./models'

instantir_path = f'./models/aggregator.pt'



# load SDXL

sdxl = StableDiffusionXLPipeline.from_pretrained('stabilityai/stable-diffusion-xl-base-1.0', torch_dtype=torch.float16)



# load adapter

image_proj_model = Resampler(

    embedding_dim=image_encoder.config.hidden_size,

    output_dim=sdxl.unet.config.cross_attention_dim,

)

init_ip_adapter_in_unet(

    sdxl.unet,

    image_proj_model,

    dcp_adapter,

)



pipe = InstantIRPipeline(

    sdxl.vae, sdxl.text_encoder, sdxl.text_encoder_2, sdxl.tokenizer, sdxl.tokenizer_2,

    sdxl.unet, sdxl.scheduler, feature_extractor=image_processor, image_encoder=image_encoder,

)

pipe.cuda()



# load previewer lora

pipe.prepare_previewers(previewer_lora_path)

pipe.unet.to(dtype=torch.float16)

pipe.scheduler = DDPMScheduler.from_pretrained('stabilityai/stable-diffusion-xl-base-1.0', subfolder="scheduler")

lcm_scheduler = LCMSingleStepScheduler.from_config(pipe.scheduler.config)



# load aggregator weights

pretrained_state_dict = torch.load(instantir_path)

pipe.aggregator.load_state_dict(pretrained_state_dict)

pipe.aggregator.to(dtype=torch.float16)

```

Then, you can restore your broken images with:

```python

# load a broken image

image = Image.open('path/to/your-image').convert("RGB")



# InstantIR restoration

image = pipe(

    prompt='',

    image=image,

    ip_adapter_image=[image],

    negative_prompt='',

    guidance_scale=7.0,

    previewer_scheduler=lcm_scheduler,

    return_dict=False,

)[0]

```

For more details including text-guided enhancement/editing, please refer to our [GitHub repository](https://github.com/JY-Joy/InstantIR). 

<!-- ## Usage Tips
1. If you're not satisfied with the similarity, try to increase the weight of "IdentityNet Strength" and "Adapter Strength".
2. If you feel that the saturation is too high, first decrease the Adapter strength. If it is still too high, then decrease the IdentityNet strength.
3. If you find that text control is not as expected, decrease Adapter strength.
4. If you find that realistic style is not good enough, go for our Github repo and use a more realistic base model. -->

## Examples

<div  align="center">
<img src='assets/qualitative_real.png'>
</div>

<div  align="center">
<img src='assets/outdomain_preview.png'>
</div>

## Disclaimer

This project is released under Apache License and aims to positively impact the field of AI-driven image generation. Users are granted the freedom to create images using this tool, but they are obligated to comply with local laws and utilize it responsibly. The developers will not assume any responsibility for potential misuse by users.

## Citation
```bibtex

@article{huang2024instantir,

  title={InstantIR: Blind Image Restoration with Instant Generative Reference},

  author={Huang, Jen-Yuan and Wang, Haofan and Wang, Qixun and Bai, Xu and Ai, Hao and Xing, Peng and Huang, Jen-Tse},

  journal={arXiv preprint arXiv:2410.06551},

  year={2024}

}

```