metadata
license: apache-2.0
language:
- en
library_name: diffusers
tags:
- diffusers
- image-generation
- class-conditional
- nit
pipeline_tag: unconditional-image-generation
widget:
- output:
url: demo_images/demo_sde250_class207_seed42.png
NiT-XL Diffusers (Class-Conditional)
Native-resolution Image Transformer (NiT-XL) checkpoint packaged as a Diffusers-style repository with vendored custom code.
What is included
transformer/:NiTTransformer2DModelweights + configscheduler/:NiTFlowMatchSchedulerconfigvae/:AutoencoderDCweights + configcustom_pipeline/: local, self-contained implementation for:NiTPipelineNiTTransformer2DModelNiTFlowMatchScheduler
test_inference.py: standalone sampling script
This repository does not depend on an external NiT-diffusers checkout during inference.
It includes a root pipeline.py custom entrypoint for Diffusers dynamic loading.
Quickstart
1) Environment
Install dependencies (example):
pip install torch diffusers safetensors
If using this project environment:
conda activate rsgen
2) Generate a demo image
Run from this repository root:
python test_inference.py \
--class-label 207 \
--height 512 \
--width 512 \
--steps 250 \
--mode sde \
--guidance-scale 2.05 \
--guidance-low 0.0 \
--guidance-high 0.7 \
--output demo_images/demo_sde250_class207_seed42.png
Python usage
from pathlib import Path
import torch
from diffusers import DiffusionPipeline
model_dir = Path(".").resolve()
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if device == "cuda" and torch.cuda.is_bf16_supported() else torch.float32
pipe = DiffusionPipeline.from_pretrained(
model_dir,
custom_pipeline=str(model_dir / "pipeline.py"),
local_files_only=True,
).to(device)
if device == "cuda":
pipe.transformer.to(dtype=dtype)
pipe.vae.to(dtype=dtype)
gen = torch.Generator(device=device).manual_seed(42)
result = pipe(
class_labels=[207],
height=512,
width=512,
num_inference_steps=250,
mode="sde",
guidance_scale=2.05,
guidance_interval=(0.0, 0.7),
generator=gen,
)
result.images[0].save("demo_images/sample.png")
For remote Hub loading:
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained(
"BiliSakura/NiT-XL-diffusers",
custom_pipeline="pipeline",
)
Recommended inference settings
- Resolution:
512x512 - Mode:
sde - Steps:
250 - Guidance scale:
2.05 - Guidance interval:
(0.0, 0.7)
Using very low steps (for example 2) is only a smoke test and will produce low-quality images.
Demo
Citation
If you use this model or the NiT method in your work, please cite:
@article{wang2025native,
title={Native-Resolution Image Synthesis},
author={Wang, Zidong and Bai, Lei and Yue, Xiangyu and Ouyang, Wanli and Zhang, Yiyuan},
year={2025},
eprint={2506.03131},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
Notes
- This is a class-conditional generator (ImageNet label ids), not a text-to-image model.
- For reproducibility, set
--seed. - The vendored custom pipeline keeps inference behavior consistent without external code dependencies.
