Instructions to use plugyawn/rae-dit-s-ep14-diffusers with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use plugyawn/rae-dit-s-ep14-diffusers with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("plugyawn/rae-dit-s-ep14-diffusers", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
RAE-DiT-S ep14 Diffusers conversion
This is a Diffusers-format conversion of the public RAE Stage-2 ImageNet-256 checkpoint DiTDH-S_ep14, bundled with the public Stage-1 RAE nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08.
It is intended as a lightweight test artifact for the Diffusers RAE-DiT PR: https://github.com/huggingface/diffusers/pull/13231
Source assets
- Stage-1 RAE:
nyu-visionx/RAE-dinov2-wReg-base-ViTXL-n08 - Stage-2 upstream weights:
nyu-visionx/RAE-collections, fileDiTs/Dinov2/wReg_base/ImageNet256/DiTDH-S_ep14/stage2_model.pt - Upstream code/configs: https://github.com/bytetriper/RAE, config
configs/stage2/training/ImageNet256/DiTDH-S_DINOv2-B.yaml
Usage
Until PR #13231 is merged, install Diffusers from the PR branch first:
pip install git+https://github.com/plugyawn/diffusers.git@rae-dit-training
Then run:
import torch
from diffusers import RAEDiTPipeline
repo_id = "plugyawn/rae-dit-s-ep14-diffusers"
pipe = RAEDiTPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16).to("cuda")
generator = torch.Generator(device="cuda").manual_seed(0)
image = pipe(
class_labels=207,
num_inference_steps=25,
guidance_scale=1.0,
generator=generator,
).images[0]
image.save("rae_dit_class207.png")
class_labels are ImageNet-1k class ids.
Validation
The conversion was validated against the upstream implementation on an A100. With matched initial latent noise, class label, and schedule, the converted model matched upstream with approximately max_abs_error=1.10e-5 on transformer outputs and max_abs_error=6.46e-5 on a fixed-seed 25-step decoded sample.
- Downloads last month
- 36