jwengr commited on
Commit
8ba54e5
·
verified ·
1 Parent(s): 6c0dae3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -15
README.md CHANGED
@@ -15,8 +15,10 @@ pipeline_tag: image-to-image
15
 
16
  This model pipeline demonstrates an advanced workflow for restoring grayscale images, performing inpainting, and converting them to RGB. The pipeline leverages two models based on the Stable Diffusion 2 architecture:
17
 
18
- 1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting process.
19
- 2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image.
 
 
20
 
21
  ---
22
 
@@ -39,41 +41,43 @@ This model pipeline demonstrates an advanced workflow for restoring grayscale im
39
  ```python
40
  import torch
41
  import numpy as np
 
42
  from PIL import Image
43
  from diffusers.utils import load_image
44
- from transformers import AutoModel
45
 
46
- # Load and preprocess images
47
  img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
48
  mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
49
 
50
- image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # Ensure 3-channel input
51
  mask_image = load_image(mask_url).resize((512, 512))
52
- mask = (np.array(mask_image) > 128) * 1
53
- image_gray_masked = Image.fromarray(((1 - mask) * np.array(image_gray)).astype(np.uint8))
54
 
55
- # Load models
56
  gray_inpaintor = AutoModel.from_pretrained(
57
  'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
58
  subfolder='gray-inpaint',
59
- trust_remote_code=True
60
  )
 
 
61
  gray2rgb = AutoModel.from_pretrained(
62
  'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
63
  subfolder='gray2rgb',
64
- trust_remote_code=True
65
  )
66
 
67
- # Move models to GPU
68
  gray_inpaintor.to('cuda')
69
  gray2rgb.to('cuda')
70
 
71
- # Memory-efficient attention (optional)
72
  # gray2rgb.unet.enable_xformers_memory_efficient_attention()
73
  # gray_inpaintor.unet.enable_xformers_memory_efficient_attention()
74
 
75
- # Perform image restoration and conversion
76
- with torch.autocast('cuda', dtype=torch.bfloat16):
77
  with torch.no_grad():
78
- image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L')
 
79
  image_restored = gray2rgb(image_gray_restored.convert('RGB'))
 
15
 
16
  This model pipeline demonstrates an advanced workflow for restoring grayscale images, performing inpainting, and converting them to RGB. The pipeline leverages two models based on the Stable Diffusion 2 architecture:
17
 
18
+ 1. **Gray-Inpainting Model**: Fills missing regions of a grayscale image using a masked inpainting process based on an **autoencoder (AE)** instead of a variational autoencoder (VAE). This simplifies the model while retaining high-quality reconstruction for the inpainted areas.
19
+
20
+ 2. **Gray-to-RGB Conversion Model**: Converts the grayscale image (or inpainted output) into a full-color RGB image by introducing a **residual path in the autoencoder (AE)**. Instead of utilizing a diffusion process, the model directly predicts the latent representation of the color image, enabling efficient and accurate conversion.
21
+
22
 
23
  ---
24
 
 
41
  ```python
42
  import torch
43
  import numpy as np
44
+
45
  from PIL import Image
46
  from diffusers.utils import load_image
47
+ from transformers import AutoConfig, AutoModel, ModelCard
48
 
 
49
  img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
50
  mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
51
 
52
+ image_gray = load_image(img_url).resize((512, 512)).convert('L').convert('RGB') # image must be 3 channel
53
  mask_image = load_image(mask_url).resize((512, 512))
54
+ mask = (np.array(mask_image)>128)*1
55
+ image_gray_masked = Image.fromarray(((1-mask) * np.array(image_gray)).astype(np.uint8))
56
 
57
+ # Load the gray-inpaint model
58
  gray_inpaintor = AutoModel.from_pretrained(
59
  'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
60
  subfolder='gray-inpaint',
61
+ trust_remote_code=True,
62
  )
63
+
64
+ Load the gray2rgb model
65
  gray2rgb = AutoModel.from_pretrained(
66
  'jwengr/stable-diffusion-2-gray-inpaint-to-rgb',
67
  subfolder='gray2rgb',
68
+ trust_remote_code=True,
69
  )
70
 
71
+ Move models to GPU
72
  gray_inpaintor.to('cuda')
73
  gray2rgb.to('cuda')
74
 
75
+ # Enable memory-efficient attention
76
  # gray2rgb.unet.enable_xformers_memory_efficient_attention()
77
  # gray_inpaintor.unet.enable_xformers_memory_efficient_attention()
78
 
79
+ with torch.autocast('cuda',dtype=torch.bfloat16):
 
80
  with torch.no_grad():
81
+ # each model's input image should be one of PIL.Image, List[PIL.Image], preprocessed tensor (B,3,H,W). Image must be 3 channel
82
+ image_gray_restored = gray_inpaintor(image_gray_masked, num_inference_steps=250, seed=10)[0].convert('L') # you can pass 'mask' arg explictly. mask : Tensor (B,1,512,512)
83
  image_restored = gray2rgb(image_gray_restored.convert('RGB'))