jwengr
/

stable-diffusion-2-gray-inpaint-to-rgb

stable-diffusion

Model card Files Files and versions Community

stable-diffusion-2-gray-inpaint-to-rgb / README.md

jwengr's picture

Update README.md

f06688a verified 19 days ago

|

history blame contribute delete

3.1 kB

	---
	base_model:
	- stabilityai/stable-diffusion-2-inpainting
	- stabilityai/stable-diffusion-2-1
	pipeline_tag: image-to-image
	library_name: diffusers
	tags:
	- inpaint
	- colorization
	- stable-diffusion
	---
	# Example Outputs

	\| Step \| Grayscale Image (Masked) \| Restored Grayscale Image \| Fully Restored RGB Image \|
	\|----------------------------------\|------------------------------------\|--------------------------------------\|-------------------------------------\|
	\| Image \| ![image_gray_masked](gray-masked.png) \| ![image_gray_restored](gray-inpaint-example.png) \| ![image_restored](gray-to-rgb-example.png) \|
	---

	# Stable Diffusion 2-Based Gray-Inpainting to RGB


	1. Gray-Inpainting Model: Fills missing regions of a grayscale image using a masked inpainting diffusion process based on an autoencoder (AE) instead of a variational autoencoder (VAE). It Contains mask dectector to enable restoration without mask information(or you can pass explicitly)

	2. Gray-to-RGB Conversion Model: Converts the grayscale image (or inpainted output) into a full-color RGB image by adding a residual path in the AE. internel unet directly predicts difference between gray and color image's latent


	---

	## Code Example

	```python
	import torch
	import numpy as np

	from PIL import Image
	from diffusers.utils import load_image
	from transformers import AutoConfig, AutoModel, ModelCard

	img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
	mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"

	image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # image must be 3 channel
	mask_image = load_image(mask_url).resize((512, 512))
	mask = (np.array(mask_image)>128)*1
	image_gray_masked = Image.fromarray(((1-mask) * np.array(image_gray)).astype(np.uint8))

	# Load the gray-inpaint model
	gray_inpaintor = AutoModel.from_pretrained(
	'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
	subfolder='gray-inpaint',
	trust_remote_code=True,
	)

	# Load the gray2rgb model
	gray2rgb = AutoModel.from_pretrained(
	'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
	subfolder='gray2rgb',
	trust_remote_code=True,
	)

	# Move models to GPU
	gray_inpaintor.to('cuda')
	gray2rgb.to('cuda')

	# Enable memory-efficient attention
	# gray2rgb.unet.enable_xformers_memory_efficient_attention()
	# gray_inpaintor.unet.enable_xformers_memory_efficient_attention()

	with torch.autocast('cuda',dtype=torch.bfloat16):
	with torch.no_grad():
	# each model's input image should be one of PIL.Image, List[PIL.Image], preprocessed tensor (B,3,H,W). Image must be 3 channel
	image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L') # you can pass 'mask' arg explicitly. mask : Tensor (B,1,512,512)
	image_restored = gray2rgb(image_gray_restored.convert('RGB'))