license: other license_name: stability-ai-community license_link: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/blob/main/LICENSE base_model: stabilityai/stable-diffusion-3.5-medium tags: - stable-diffusion - stable-diffusion-3 - controlnet - albedo - text-to-image - image-to-image library_name: diffusers pipeline_tag: image-to-image

Albedo-Conditioned ControlNet for Stable Diffusion 3.5

This is an albedo-conditioned ControlNet model trained on Stable Diffusion 3.5 Medium.

Model Details

Base Model: Stable Diffusion 3.5 Medium
Checkpoint: {checkpoint_name}
Conditioning: Albedo maps + text prompts
Resolution: 512x512 (can be adapted to other resolutions)
Training Dataset: PixelProse (albedo + RGB pairs with captions)

Usage

import torch
from diffusers import AutoencoderKL, SD3Transformer2DModel
from transformers import CLIPTokenizer, T5TokenizerFast
from PIL import Image
import numpy as np

# Load base model components
base_model = "stabilityai/stable-diffusion-3.5-medium"
vae = AutoencoderKL.from_pretrained(base_model, subfolder="vae")

# Load trained transformer
transformer = SD3Transformer2DModel.from_pretrained(
    "{model_id}",
    subfolder="transformer",
    torch_dtype=torch.bfloat16
)

# Load your custom pipeline (from training repo)
from pipelines.pipeline_stable_diffusion_3 import StableDiffusion3Pipeline

pipeline = StableDiffusion3Pipeline.from_pretrained(
    base_model,
    transformer=transformer,
    vae=vae,
    torch_dtype=torch.bfloat16,
)
pipeline.to("cuda")

# Load and prepare albedo image
albedo_image = Image.open("path/to/albedo.png").convert("RGB")
albedo_image = albedo_image.resize((512, 512))

# Convert to tensor and normalize
albedo_np = np.array(albedo_image).astype(np.float32) / 255.0
albedo_tensor = torch.from_numpy(albedo_np).permute(2, 0, 1) * 2.0 - 1.0
albedo_tensor = albedo_tensor.unsqueeze(0).unsqueeze(0).to("cuda", dtype=torch.bfloat16)

# Encode albedo to control latents
from light_utils import encode_intrinsics
control_latents = encode_intrinsics(albedo_tensor, vae, torch.bfloat16)

# Generate
prompt = "A beautiful landscape, soft golden hour lighting"
image = pipeline(
    prompt=prompt,
    control_image=control_latents,
    num_inference_steps=50,
    guidance_scale=7.5,
    height=512,
    width=512,
).images[0]

image.save("output.png")

Lighting Control

The model responds well to lighting descriptions in prompts:

# Different lighting conditions
prompts = [
    "A forest scene, at sunrise",
    "A forest scene, with fluorescent blue lighting",
]

for prompt in prompts:
    image = pipeline(
        prompt=prompt,
        control_image=control_latents,
        num_inference_steps=50,
    ).images[0]
    # Each will have different lighting/mood

License

This model inherits the license from Stable Diffusion 3.5 Medium. See: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/blob/main/LICENSE

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support