PixelDiT ControlNet + IP-Adapter

ControlNet scribble conditioning and IP-Adapter style transfer for PixelDiT-1300M.

Note: PixelDiT-1300M is a model by NVIDIA Research. This repo contains trained adapters only — we are not affiliated with NVIDIA.

Files

File	Description
`controlnet.safetensors`	Combined ControlNet (7 blocks) + IP-Adapter weights
`ip_adapter.safetensors`	IP-Adapter weights only
`hed_detector.safetensors`	HED edge detector (Apache-2.0, VGG-based)
`config.json`	Model config
`train.py`	Joint ControlNet + IP-Adapter training script
`precompute_wd_tags.py`	Run WD tagger on dataset → `wd_tags.json`
`precompute_embeddings.py`	Encode images with SigLIP + Gemma → memmap files
`precompute_hed.py`	Precompute HED edge maps for a dataset
`control_maps.py`	Edge map post-processing utilities
`hed.py`	HED model definition
`convert_to_safetensors.py`	Convert .pt checkpoints to safetensors

Usage

from diffusers.pipelines.pixeldit import PixelDiTStyledPipeline
from huggingface_hub import hf_hub_download
from PIL import Image
import torch

pipe = PixelDiTStyledPipeline.from_pretrained_styled(
    "madtune/pixeldit-diffusers",
    controlnet_path=hf_hub_download("madtune/pixeldit-controlnet", "controlnet.safetensors"),
    ip_adapter_path=hf_hub_download("madtune/pixeldit-controlnet", "ip_adapter.safetensors"),
    hed_ckpt_path=hf_hub_download("madtune/pixeldit-controlnet", "hed_detector.safetensors"),
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload(gpu_id=1)

out = pipe(
    image=Image.open("style_ref.jpg"),
    prompt="gothic pale woman, dramatic rim lighting",
    variation_strength=0.85,
    ctrl_strength=0.25,
    ip_strength=0.85,
    flow_shift=8.0,
    guidance_scale=4.5,
    num_inference_steps=50,
).images[0]
out.save("output.jpg")

Recommended settings

Mode	`ctrl_strength`	`ip_strength`	`variation_strength`
Pure variation	0.0	0.0	0.65–0.85
ControlNet only	0.25	0.0	0.85
IP-Adapter only	0.0	0.85	0.85
Full combo (best)	0.25	0.35–0.85	0.85

flow_shift=8.0 + guidance_scale=3.0–3.5 works well at 768px+. 4.5 is valid but produces oversaturated colours.

Downloads last month: 32

Model tree for madtune/pixeldit-controlnet

Base model

nvidia/PixelDiT-1300M-1024px

Adapter

madtune/pixeldit-diffusers

Adapter

(1)

this model