yahoo-inc
/

photo-background-generation

Text-to-Image

Diffusers

Safetensors

StableDiffusionControlNetInpaintPipeline

yahoo-open-source-software-incubator

Model card Files Files and versions Community

erfan-yahoo commited on May 5

Commit

a67f43b

•

1 Parent(s): 087a5e3

Update README.md

Browse files

Files changed (1) hide show

README.md +92 -29

README.md CHANGED Viewed

@@ -3,7 +3,7 @@ license: apache-2.0
 tags:
 - yahoo-open-source-software-incubator
 ---
-# Salient Object Aware Background Generation [![Paper](assets/arxiv.svg)](https://arxiv.org/pdf/2404.10157.pdf)
 This repository accompanies our paper, [Salient Object-Aware Background Generation using Text-Guided Diffusion Models](https://arxiv.org/abs/2404.10157), which has been accepted for publication in [CVPR 2024 Generative Models for Computer Vision](https://generative-vision.github.io/workshop-CVPR-24/) workshop.
 The paper addresses an issue we call "object expansion" when generating backgrounds for salient objects using inpainting diffusion models.  We show that models such as [Stable Inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) can sometimes arbitrarily expand or distort the salient object, which is undesirable in applications where the object's identity should be preserved, such as e-commerce ads. We provide some examples of object expansion as follows:
@@ -12,39 +12,102 @@ The paper addresses an issue we call "object expansion" when generating backgrou
   <img src="assets/fig.jpg">
 </div>
-## Setup
-The dependencies are provided in `requirements.txt`, install them by:
-```bash
-pip install -r requirements.txt
 ```
-## Usage
-### Training
-The following runs the training of text-to-image inpainting ControlNet initialized with the weights of "stable-diffusion-2-inpainting":
-```bash
-accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet_inpaint.py --pretrained_model_name_or_path "stable-diffusion-2-inpainting" --proportion_empty_prompts 0.1
 ```
-The following runs the training of text-to-image ControlNet initialized with the weights of "stable-diffusion-2-base":
-```bash
-accelerate launch --multi_gpu --mixed_precision=fp16 --num_processes=8 train_controlnet.py --pretrained_model_name_or_path "stable-diffusion-2-base" --proportion_empty_prompts 0.1
 ```
-### Inference
-Please refer to `inference.ipynb`. Tu run the code you need to download our model checkpoints.
-## Models Checkpoints
-| Model link                                       | Datasets used                                                                                                                                                                                      |
-|--------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| [controlnet_inpainting_salient_aware.pth](https://drive.google.com/file/d/1ad4CNJqFI_HnXFFRqcS4mOD0Le2Mvd3L/view?usp=sharing)     | Salient segmentation datasets, COCO |
 ## Citations
 If you found our work useful, please consider citing our paper:

 tags:
 - yahoo-open-source-software-incubator
 ---
+# Salient Object-Aware Background Generation using Text-Guided Diffusion Models [![Paper](assets/arxiv.svg)](https://arxiv.org/pdf/2404.10157.pdf)
 This repository accompanies our paper, [Salient Object-Aware Background Generation using Text-Guided Diffusion Models](https://arxiv.org/abs/2404.10157), which has been accepted for publication in [CVPR 2024 Generative Models for Computer Vision](https://generative-vision.github.io/workshop-CVPR-24/) workshop.
 The paper addresses an issue we call "object expansion" when generating backgrounds for salient objects using inpainting diffusion models.  We show that models such as [Stable Inpainting](https://huggingface.co/stabilityai/stable-diffusion-2-inpainting) can sometimes arbitrarily expand or distort the salient object, which is undesirable in applications where the object's identity should be preserved, such as e-commerce ads. We provide some examples of object expansion as follows:
   <img src="assets/fig.jpg">
 </div>
+# Inference
+### Load pipeline
+```py
+from diffusers import (
+    AutoencoderKL,
+    ControlNetModel,
+    DDPMScheduler,
+    UNet2DConditionModel,
+    UniPCMultistepScheduler,
+)
+from diffusers import StableDiffusionControlNetInpaintPipeline
+from transformers import AutoTokenizer, PretrainedConfig, CLIPTextModel
+import torch
+# Load our pretrained ControlNet
+controlnet = ControlNetModel.from_pretrained('yahoo-inc/photo-background-generation')
+# Load Stable Inpainting 2.0
+sd_inpainting_model_name = "stabilityai/stable-diffusion-2-inpainting"
+tokenizer = AutoTokenizer.from_pretrained(
+        sd_inpainting_model_name,
+        subfolder="tokenizer",
+        use_fast=False,
+    )
+noise_scheduler = DDPMScheduler.from_pretrained(sd_inpainting_model_name, subfolder="scheduler")
+text_encoder = CLIPTextModel.from_pretrained(
+    sd_inpainting_model_name, subfolder="text_encoder", revision=None
+)
+vae = AutoencoderKL.from_pretrained(sd_inpainting_model_name, subfolder="vae", revision=None)
+unet = UNet2DConditionModel.from_pretrained(
+    sd_inpainting_model_name, subfolder="unet", revision=None
+)
+# Create the SD based inpainting pipeline
+pipeline = StableDiffusionControlNetInpaintPipeline.from_pretrained(
+    sd_inpainting_model_name,
+    vae=vae,
+    text_encoder=text_encoder,
+    tokenizer=tokenizer,
+    unet=unet,
+    controlnet=controlnet,
+    safety_checker=None,
+    revision=None,
+    torch_dtype=torch.float32,
+)
+pipeline.scheduler = UniPCMultistepScheduler.from_config(pipeline.scheduler.config)
+pipeline = pipeline.to('cuda')
+pipeline.set_progress_bar_config(disable=True)
 ```
+### Load an image and extract its background and foreground
+```py
+from PIL import Image, ImageOps
+import requests
+from io import BytesIO
+from transparent_background import Remover
+def resize_with_padding(img, expected_size):
+    img.thumbnail((expected_size[0], expected_size[1]))
+    # print(img.size)
+    delta_width = expected_size[0] - img.size[0]
+    delta_height = expected_size[1] - img.size[1]
+    pad_width = delta_width // 2
+    pad_height = delta_height // 2
+    padding = (pad_width, pad_height, delta_width - pad_width, delta_height - pad_height)
+    return ImageOps.expand(img, padding)
+seed = 0
+image_url = 'https://upload.wikimedia.org/wikipedia/commons/thumb/1/16/Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teres%C3%B3polis.jpg/2560px-Granja_comary_Cisne_-_Escalavrado_e_Dedo_De_Deus_ao_fundo_-Teres%C3%B3polis.jpg'
+response = requests.get(image_url)
+img = Image.open(BytesIO(response.content))
+img = resize_with_padding(img, (512, 512))
+# Load background detection model
+remover = Remover() # default setting
+remover = Remover(mode='base') # nightly release checkpoint
+# Get foreground mask
+fg_mask = remover.process(img, type='map') # default setting - transparent background
 ```
+### Background generation
+```py
+seed = 13
+mask = ImageOps.invert(fg_mask)
+img = resize_with_padding(img, (512, 512))
+generator = torch.Generator(device='cuda').manual_seed(seed)
+prompt = 'A dark swan in a bedroom'
+cond_scale = 1.0
+with torch.autocast("cuda"):
+    controlnet_image = pipeline(
+        prompt=prompt, image=img, mask_image=mask, control_image=mask, num_images_per_prompt=1, generator=generator, num_inference_steps=20, guess_mode=False, controlnet_conditioning_scale=cond_scale
+    ).images[0]
+controlnet_image
 ```
 ## Citations
 If you found our work useful, please consider citing our paper: