|
--- |
|
base_model: |
|
- stabilityai/stable-diffusion-2-inpainting |
|
- stabilityai/stable-diffusion-2-1 |
|
pipeline_tag: image-to-image |
|
library_name: diffusers |
|
tags: |
|
- inpaint |
|
- colorization |
|
- stable-diffusion |
|
--- |
|
# **Example Outputs** |
|
|
|
| **Step** | **Grayscale Image (Masked)** | **Restored Grayscale Image** | **Fully Restored RGB Image** | |
|
|----------------------------------|------------------------------------|--------------------------------------|-------------------------------------| |
|
| **Image** | ![image_gray_masked](gray-masked.png) | ![image_gray_restored](gray-inpaint-example.png) | ![image_restored](gray-to-rgb-example.png) | |
|
--- |
|
|
|
# **Stable Diffusion 2-Based Gray-Inpainting to RGB** |
|
|
|
|
|
1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting diffusion process based on an autoencoder (AE) instead of a variational autoencoder (VAE). It Contains mask dectector to enable restoration without mask information(or you can pass explicitly) |
|
|
|
2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image by adding a residual path in the AE. internel unet directly predicts difference between gray and color image's latent |
|
|
|
|
|
--- |
|
|
|
## **Code Example** |
|
|
|
```python |
|
import torch |
|
import numpy as np |
|
|
|
from PIL import Image |
|
from diffusers.utils import load_image |
|
from transformers import AutoConfig, AutoModel, ModelCard |
|
|
|
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png" |
|
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png" |
|
|
|
image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # image must be 3 channel |
|
mask_image = load_image(mask_url).resize((512, 512)) |
|
mask = (np.array(mask_image)>128)*1 |
|
image_gray_masked = Image.fromarray(((1-mask) * np.array(image_gray)).astype(np.uint8)) |
|
|
|
# Load the gray-inpaint model |
|
gray_inpaintor = AutoModel.from_pretrained( |
|
'jwengr/stable-diffusion-2-gray-inpaint-to-rgb', |
|
subfolder='gray-inpaint', |
|
trust_remote_code=True, |
|
) |
|
|
|
# Load the gray2rgb model |
|
gray2rgb = AutoModel.from_pretrained( |
|
'jwengr/stable-diffusion-2-gray-inpaint-to-rgb', |
|
subfolder='gray2rgb', |
|
trust_remote_code=True, |
|
) |
|
|
|
# Move models to GPU |
|
gray_inpaintor.to('cuda') |
|
gray2rgb.to('cuda') |
|
|
|
# Enable memory-efficient attention |
|
# gray2rgb.unet.enable_xformers_memory_efficient_attention() |
|
# gray_inpaintor.unet.enable_xformers_memory_efficient_attention() |
|
|
|
with torch.autocast('cuda',dtype=torch.bfloat16): |
|
with torch.no_grad(): |
|
# each model's input image should be one of PIL.Image, List[PIL.Image], preprocessed tensor (B,3,H,W). Image must be 3 channel |
|
image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L') # you can pass 'mask' arg explicitly. mask : Tensor (B,1,512,512) |
|
image_restored = gray2rgb(image_gray_restored.convert('RGB')) |