erfan-yahoo commited on
Commit
a67f43b
1 Parent(s): 087a5e3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +92 -29
README.md CHANGED
@@ -3,7 +3,7 @@ license: apache-2.0
3
  tags:
4
  - yahoo-open-source-software-incubator
5
  ---
6
- # Salient Object Aware Background Generation [![Paper](assets/arxiv.svg)](https://arxiv.org/pdf/2404.10157.pdf)
7
  This repository accompanies our paper, [Salient Object-Aware Background Generation using Text-Guided Diffusion Models](https://arxiv.org/abs/2404.10157), which has been accepted for publication in [CVPR 2024 Generative Models for Computer Vision](https://generative-vision.github.io/workshop-CVPR-24/) workshop.
8
 
9
  The paper addresses an issue we call "object expansion" when generating backgrounds for salient objects using inpainting diffusion models. We show that models such as [Stable Inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) can sometimes arbitrarily expand or distort the salient object, which is undesirable in applications where the object's identity should be preserved, such as e-commerce ads. We provide some examples of object expansion as follows:
@@ -12,39 +12,102 @@ The paper addresses an issue we call "object expansion" when generating backgrou
12
  <img src="assets/fig.jpg">
13
  </div>
14
 
15
-
16
-
17
- ## Setup
18
-
19
- The dependencies are provided in `requirements.txt`, install them by:
20
-
21
- ```bash
22
- pip install -r requirements.txt
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
23
  ```
24
 
25
- ## Usage
26
- ### Training
27
-
28
- The following runs the training of text-to-image inpainting ControlNet initialized with the weights of "stable-diffusion-2-inpainting":
29
- ```bash
30
- accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet_inpaint.py --pretrained_model_name_or_path "stable-diffusion-2-inpainting" --proportion_empty_prompts 0.1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
31
  ```
32
 
33
- The following runs the training of text-to-image ControlNet initialized with the weights of "stable-diffusion-2-base":
34
- ```bash
35
- accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet.py --pretrained_model_name_or_path "stable-diffusion-2-base" --proportion_empty_prompts 0.1
 
 
 
 
 
 
 
 
 
 
36
  ```
37
-
38
- ### Inference
39
-
40
- Please refer to `inference.ipynb`. Tu run the code you need to download our model checkpoints.
41
-
42
- ## Models Checkpoints
43
-
44
- | Model link | Datasets used |
45
- |--------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
46
- | [controlnet_inpainting_salient_aware.pth](https://drive.google.com/file/d/1ad4CNJqFI_HnXFFRqcS4mOD0Le2Mvd3L/view?usp=sharing) | Salient segmentation datasets, COCO |
47
-
48
  ## Citations
49
 
50
  If you found our work useful, please consider citing our paper:
 
3
  tags:
4
  - yahoo-open-source-software-incubator
5
  ---
6
+ # Salient Object-Aware Background Generation using Text-Guided Diffusion Models [![Paper](assets/arxiv.svg)](https://arxiv.org/pdf/2404.10157.pdf)
7
  This repository accompanies our paper, [Salient Object-Aware Background Generation using Text-Guided Diffusion Models](https://arxiv.org/abs/2404.10157), which has been accepted for publication in [CVPR 2024 Generative Models for Computer Vision](https://generative-vision.github.io/workshop-CVPR-24/) workshop.
8
 
9
  The paper addresses an issue we call "object expansion" when generating backgrounds for salient objects using inpainting diffusion models. We show that models such as [Stable Inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) can sometimes arbitrarily expand or distort the salient object, which is undesirable in applications where the object's identity should be preserved, such as e-commerce ads. We provide some examples of object expansion as follows:
 
12
  <img src="assets/fig.jpg">
13
  </div>
14
 
15
+ # Inference
16
+
17
+ ### Load pipeline
18
+ ```py
19
+ from diffusers import (
20
+ AutoencoderKL,
21
+ ControlNetModel,
22
+ DDPMScheduler,
23
+ UNet2DConditionModel,
24
+ UniPCMultistepScheduler,
25
+ )
26
+ from diffusers import StableDiffusionControlNetInpaintPipeline
27
+ from transformers import AutoTokenizer, PretrainedConfig, CLIPTextModel
28
+ import torch
29
+
30
+ # Load our pretrained ControlNet
31
+ controlnet = ControlNetModel.from_pretrained('yahoo-inc/photo-background-generation')
32
+
33
+ # Load Stable Inpainting 2.0
34
+ sd_inpainting_model_name = "stabilityai/stable-diffusion-2-inpainting"
35
+ tokenizer = AutoTokenizer.from_pretrained(
36
+ sd_inpainting_model_name,
37
+ subfolder="tokenizer",
38
+ use_fast=False,
39
+ )
40
+ noise_scheduler = DDPMScheduler.from_pretrained(sd_inpainting_model_name, subfolder="scheduler")
41
+ text_encoder = CLIPTextModel.from_pretrained(
42
+ sd_inpainting_model_name, subfolder="text_encoder", revision=None
43
+ )
44
+ vae = AutoencoderKL.from_pretrained(sd_inpainting_model_name, subfolder="vae", revision=None)
45
+ unet = UNet2DConditionModel.from_pretrained(
46
+ sd_inpainting_model_name, subfolder="unet", revision=None
47
+ )
48
+
49
+ # Create the SD based inpainting pipeline
50
+ pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained(
51
+ sd_inpainting_model_name,
52
+ vae=vae,
53
+ text_encoder=text_encoder,
54
+ tokenizer=tokenizer,
55
+ unet=unet,
56
+ controlnet=controlnet,
57
+ safety_checker=None,
58
+ revision=None,
59
+ torch_dtype=torch.float32,
60
+ )
61
+ pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config)
62
+ pipeline = pipeline.to('cuda')
63
+ pipeline.set_progress_bar_config(disable=True)
64
  ```
65
 
66
+ ### Load an image and extract its background and foreground
67
+ ```py
68
+ from PIL import Image, ImageOps
69
+ import requests
70
+ from io import BytesIO
71
+ from transparent_background import Remover
72
+
73
+ def resize_with_padding(img, expected_size):
74
+ img.thumbnail((expected_size[0], expected_size[1]))
75
+ # print(img.size)
76
+ delta_width = expected_size[0] - img.size[0]
77
+ delta_height = expected_size[1] - img.size[1]
78
+ pad_width = delta_width // 2
79
+ pad_height = delta_height // 2
80
+ padding = (pad_width, pad_height, delta_width - pad_width, delta_height - pad_height)
81
+ return ImageOps.expand(img, padding)
82
+
83
+ seed = 0
84
+ image_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teres%C3%B3polis.jpg/2560px-Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teres%C3%B3polis.jpg'
85
+ response = requests.get(image_url)
86
+ img = Image.open(BytesIO(response.content))
87
+ img = resize_with_padding(img, (512, 512))
88
+
89
+ # Load background detection model
90
+ remover = Remover() # default setting
91
+ remover = Remover(mode='base') # nightly release checkpoint
92
+
93
+ # Get foreground mask
94
+ fg_mask = remover.process(img, type='map') # default setting - transparent background
95
  ```
96
 
97
+ ### Background generation
98
+ ```py
99
+ seed = 13
100
+ mask = ImageOps.invert(fg_mask)
101
+ img = resize_with_padding(img, (512, 512))
102
+ generator = torch.Generator(device='cuda').manual_seed(seed)
103
+ prompt = 'A dark swan in a bedroom'
104
+ cond_scale = 1.0
105
+ with torch.autocast("cuda"):
106
+ controlnet_image = pipeline(
107
+ prompt=prompt, image=img, mask_image=mask, control_image=mask, num_images_per_prompt=1, generator=generator, num_inference_steps=20, guess_mode=False, controlnet_conditioning_scale=cond_scale
108
+ ).images[0]
109
+ controlnet_image
110
  ```
 
 
 
 
 
 
 
 
 
 
 
111
  ## Citations
112
 
113
  If you found our work useful, please consider citing our paper: