Update README.md

Browse files

Files changed (1) hide show

README.md +1 -157

README.md CHANGED Viewed

@@ -1,157 +1 @@
----
-license: openrail++
-base_model: stabilityai/stable-diffusion-xl-base-1.0
-tags:
-- stable-diffusion-xl
-- stable-diffusion-xl-diffusers
-- text-to-image
-- diffusers
-- controlnet
-inference: false
-duplicated_from: diffusers/controlnet-zoe-depth-sdxl-1.0
----
-# SDXL-controlnet: Zoe-Depth
-These are ControlNet weights trained on stabilityai/stable-diffusion-xl-base-1.0 with zoe depth conditioning. [Zoe-depth](https://github.com/isl-org/ZoeDepth) is an open-source SOTA depth estimation model which produces high-quality depth maps, which are better suited for conditioning.
-You can find some example images in the following.
-![images_0)](./zoe-depth-example.png)
-![images_2](./zoe-megatron.png)
-![images_3](./photo-woman.png)
-## Usage
-Make sure first to install the libraries:
-```bash
-pip install accelerate transformers safetensors diffusers
-```
-And then setup the zoe-depth model
-```
-import torch
-import matplotlib
-import matplotlib.cm
-import numpy as np
-torch.hub.help("intel-isl/MiDaS", "DPT_BEiT_L_384", force_reload=True)  # Triggers fresh download of MiDaS repo
-model_zoe_n = torch.hub.load("isl-org/ZoeDepth", "ZoeD_NK", pretrained=True).eval()
-model_zoe_n = model_zoe_n.to("cuda")
-def colorize(value, vmin=None, vmax=None, cmap='gray_r', invalid_val=-99, invalid_mask=None, background_color=(128, 128, 128, 255), gamma_corrected=False, value_transform=None):
-    if isinstance(value, torch.Tensor):
-        value = value.detach().cpu().numpy()
-    value = value.squeeze()
-    if invalid_mask is None:
-        invalid_mask = value == invalid_val
-    mask = np.logical_not(invalid_mask)
-    # normalize
-    vmin = np.percentile(value[mask],2) if vmin is None else vmin
-    vmax = np.percentile(value[mask],85) if vmax is None else vmax
-    if vmin != vmax:
-        value = (value - vmin) / (vmax - vmin)  # vmin..vmax
-    else:
-        # Avoid 0-division
-        value = value * 0.
-    # squeeze last dim if it exists
-    # grey out the invalid values
-    value[invalid_mask] = np.nan
-    cmapper = matplotlib.cm.get_cmap(cmap)
-    if value_transform:
-        value = value_transform(value)
-        # value = value / value.max()
-    value = cmapper(value, bytes=True)  # (nxmx4)
-    # img = value[:, :, :]
-    img = value[...]
-    img[invalid_mask] = background_color
-    # gamma correction
-    img = img / 255
-    img = np.power(img, 2.2)
-    img = img * 255
-    img = img.astype(np.uint8)
-    img = Image.fromarray(img)
-    return img
-def get_zoe_depth_map(image):
-    with torch.autocast("cuda", enabled=True):
-        depth = model_zoe_n.infer_pil(image)
-    depth = colorize(depth, cmap="gray_r")
-    return depth
-```
-Now we're ready to go:
-```python
-import torch
-import numpy as np
-from PIL import Image
-from diffusers import ControlNetModel, StableDiffusionXLControlNetPipeline, AutoencoderKL
-from diffusers.utils import load_image
-controlnet = ControlNetModel.from_pretrained(
-    "diffusers/controlnet-zoe-depth-sdxl-1.0",
-    use_safetensors=True,
-    torch_dtype=torch.float16,
-).to("cuda")
-vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
-pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
-    "stabilityai/stable-diffusion-xl-base-1.0",
-    controlnet=controlnet,
-    vae=vae,
-    variant="fp16",
-    use_safetensors=True,
-    torch_dtype=torch.float16,
-).to("cuda")
-pipe.enable_model_cpu_offload()
-prompt = "pixel-art margot robbie as barbie, in a coupé . low-res, blocky, pixel art style, 8-bit graphics"
-negative_prompt = "sloppy, messy, blurry, noisy, highly detailed, ultra textured, photo, realistic"
-image = load_image("https://media.vogue.fr/photos/62bf04b69a57673c725432f3/3:2/w_1793,h_1195,c_limit/rev-1-Barbie-InstaVert_High_Res_JPEG.jpeg")
-controlnet_conditioning_scale = 0.55
-depth_image = get_zoe_depth_map(image).resize((1088, 896))
-generator = torch.Generator("cuda").manual_seed(978364352)
-images = pipe(
-    prompt, image=depth_image, num_inference_steps=50, controlnet_conditioning_scale=controlnet_conditioning_scale, generator=generator
-).images
-images[0]
-images[0].save(f"pixel-barbie.png")
-```
-![images_1)](./barbie.png)
-To more details, check out the official documentation of [`StableDiffusionXLControlNetPipeline`](https://huggingface.co/docs/diffusers/main/en/api/pipelines/controlnet_sdxl).
-### Training
-Our training script was built on top of the official training script that we provide [here](https://github.com/huggingface/diffusers/blob/main/examples/controlnet/README_sdxl.md).
-#### Training data and Compute
-The model is trained on 3M image-text pairs from LAION-Aesthetics V2. The model is trained for 700 GPU hours on 80GB A100 GPUs.
-#### Batch size
-Data parallel with a single gpu batch size of 8 for a total batch size of 256.
-#### Hyper Parameters
-Constant learning rate of 1e-5.
-#### Mixed precision
-fp16


1	+ That's a copy of [diffusers/controlnet-zoe-depth-sdxl-1.0](https://huggingface.co/diffusers/controlnet-zoe-depth-sdxl-1.0)