erfan-yahoo
commited on
Commit
•
a67f43b
1
Parent(s):
087a5e3
Update README.md
Browse files
README.md
CHANGED
@@ -3,7 +3,7 @@ license: apache-2.0
|
|
3 |
tags:
|
4 |
- yahoo-open-source-software-incubator
|
5 |
---
|
6 |
-
# Salient Object
|
7 |
This repository accompanies our paper, [Salient Object-Aware Background Generation using Text-Guided Diffusion Models](https://arxiv.org/abs/2404.10157), which has been accepted for publication in [CVPR 2024 Generative Models for Computer Vision](https://generative-vision.github.io/workshop-CVPR-24/) workshop.
|
8 |
|
9 |
The paper addresses an issue we call "object expansion" when generating backgrounds for salient objects using inpainting diffusion models. We show that models such as [Stable Inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) can sometimes arbitrarily expand or distort the salient object, which is undesirable in applications where the object's identity should be preserved, such as e-commerce ads. We provide some examples of object expansion as follows:
|
@@ -12,39 +12,102 @@ The paper addresses an issue we call "object expansion" when generating backgrou
|
|
12 |
<img src="assets/fig.jpg">
|
13 |
</div>
|
14 |
|
15 |
-
|
16 |
-
|
17 |
-
|
18 |
-
|
19 |
-
|
20 |
-
|
21 |
-
|
22 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
```
|
24 |
|
25 |
-
|
26 |
-
|
27 |
-
|
28 |
-
|
29 |
-
|
30 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
```
|
32 |
|
33 |
-
|
34 |
-
```
|
35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
36 |
```
|
37 |
-
|
38 |
-
### Inference
|
39 |
-
|
40 |
-
Please refer to `inference.ipynb`. Tu run the code you need to download our model checkpoints.
|
41 |
-
|
42 |
-
## Models Checkpoints
|
43 |
-
|
44 |
-
| Model link | Datasets used |
|
45 |
-
|--------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
|
46 |
-
| [controlnet_inpainting_salient_aware.pth](https://drive.google.com/file/d/1ad4CNJqFI_HnXFFRqcS4mOD0Le2Mvd3L/view?usp=sharing) | Salient segmentation datasets, COCO |
|
47 |
-
|
48 |
## Citations
|
49 |
|
50 |
If you found our work useful, please consider citing our paper:
|
|
|
3 |
tags:
|
4 |
- yahoo-open-source-software-incubator
|
5 |
---
|
6 |
+
# Salient Object-Aware Background Generation using Text-Guided Diffusion Models [![Paper](assets/arxiv.svg)](https://arxiv.org/pdf/2404.10157.pdf)
|
7 |
This repository accompanies our paper, [Salient Object-Aware Background Generation using Text-Guided Diffusion Models](https://arxiv.org/abs/2404.10157), which has been accepted for publication in [CVPR 2024 Generative Models for Computer Vision](https://generative-vision.github.io/workshop-CVPR-24/) workshop.
|
8 |
|
9 |
The paper addresses an issue we call "object expansion" when generating backgrounds for salient objects using inpainting diffusion models. We show that models such as [Stable Inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) can sometimes arbitrarily expand or distort the salient object, which is undesirable in applications where the object's identity should be preserved, such as e-commerce ads. We provide some examples of object expansion as follows:
|
|
|
12 |
<img src="assets/fig.jpg">
|
13 |
</div>
|
14 |
|
15 |
+
# Inference
|
16 |
+
|
17 |
+
### Load pipeline
|
18 |
+
```py
|
19 |
+
from diffusers import (
|
20 |
+
AutoencoderKL,
|
21 |
+
ControlNetModel,
|
22 |
+
DDPMScheduler,
|
23 |
+
UNet2DConditionModel,
|
24 |
+
UniPCMultistepScheduler,
|
25 |
+
)
|
26 |
+
from diffusers import StableDiffusionControlNetInpaintPipeline
|
27 |
+
from transformers import AutoTokenizer, PretrainedConfig, CLIPTextModel
|
28 |
+
import torch
|
29 |
+
|
30 |
+
# Load our pretrained ControlNet
|
31 |
+
controlnet = ControlNetModel.from_pretrained('yahoo-inc/photo-background-generation')
|
32 |
+
|
33 |
+
# Load Stable Inpainting 2.0
|
34 |
+
sd_inpainting_model_name = "stabilityai/stable-diffusion-2-inpainting"
|
35 |
+
tokenizer = AutoTokenizer.from_pretrained(
|
36 |
+
sd_inpainting_model_name,
|
37 |
+
subfolder="tokenizer",
|
38 |
+
use_fast=False,
|
39 |
+
)
|
40 |
+
noise_scheduler = DDPMScheduler.from_pretrained(sd_inpainting_model_name, subfolder="scheduler")
|
41 |
+
text_encoder = CLIPTextModel.from_pretrained(
|
42 |
+
sd_inpainting_model_name, subfolder="text_encoder", revision=None
|
43 |
+
)
|
44 |
+
vae = AutoencoderKL.from_pretrained(sd_inpainting_model_name, subfolder="vae", revision=None)
|
45 |
+
unet = UNet2DConditionModel.from_pretrained(
|
46 |
+
sd_inpainting_model_name, subfolder="unet", revision=None
|
47 |
+
)
|
48 |
+
|
49 |
+
# Create the SD based inpainting pipeline
|
50 |
+
pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained(
|
51 |
+
sd_inpainting_model_name,
|
52 |
+
vae=vae,
|
53 |
+
text_encoder=text_encoder,
|
54 |
+
tokenizer=tokenizer,
|
55 |
+
unet=unet,
|
56 |
+
controlnet=controlnet,
|
57 |
+
safety_checker=None,
|
58 |
+
revision=None,
|
59 |
+
torch_dtype=torch.float32,
|
60 |
+
)
|
61 |
+
pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config)
|
62 |
+
pipeline = pipeline.to('cuda')
|
63 |
+
pipeline.set_progress_bar_config(disable=True)
|
64 |
```
|
65 |
|
66 |
+
### Load an image and extract its background and foreground
|
67 |
+
```py
|
68 |
+
from PIL import Image, ImageOps
|
69 |
+
import requests
|
70 |
+
from io import BytesIO
|
71 |
+
from transparent_background import Remover
|
72 |
+
|
73 |
+
def resize_with_padding(img, expected_size):
|
74 |
+
img.thumbnail((expected_size[0], expected_size[1]))
|
75 |
+
# print(img.size)
|
76 |
+
delta_width = expected_size[0] - img.size[0]
|
77 |
+
delta_height = expected_size[1] - img.size[1]
|
78 |
+
pad_width = delta_width // 2
|
79 |
+
pad_height = delta_height // 2
|
80 |
+
padding = (pad_width, pad_height, delta_width - pad_width, delta_height - pad_height)
|
81 |
+
return ImageOps.expand(img, padding)
|
82 |
+
|
83 |
+
seed = 0
|
84 |
+
image_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teres%C3%B3polis.jpg/2560px-Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teres%C3%B3polis.jpg'
|
85 |
+
response = requests.get(image_url)
|
86 |
+
img = Image.open(BytesIO(response.content))
|
87 |
+
img = resize_with_padding(img, (512, 512))
|
88 |
+
|
89 |
+
# Load background detection model
|
90 |
+
remover = Remover() # default setting
|
91 |
+
remover = Remover(mode='base') # nightly release checkpoint
|
92 |
+
|
93 |
+
# Get foreground mask
|
94 |
+
fg_mask = remover.process(img, type='map') # default setting - transparent background
|
95 |
```
|
96 |
|
97 |
+
### Background generation
|
98 |
+
```py
|
99 |
+
seed = 13
|
100 |
+
mask = ImageOps.invert(fg_mask)
|
101 |
+
img = resize_with_padding(img, (512, 512))
|
102 |
+
generator = torch.Generator(device='cuda').manual_seed(seed)
|
103 |
+
prompt = 'A dark swan in a bedroom'
|
104 |
+
cond_scale = 1.0
|
105 |
+
with torch.autocast("cuda"):
|
106 |
+
controlnet_image = pipeline(
|
107 |
+
prompt=prompt, image=img, mask_image=mask, control_image=mask, num_images_per_prompt=1, generator=generator, num_inference_steps=20, guess_mode=False, controlnet_conditioning_scale=cond_scale
|
108 |
+
).images[0]
|
109 |
+
controlnet_image
|
110 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
111 |
## Citations
|
112 |
|
113 |
If you found our work useful, please consider citing our paper:
|