How to use from the
Use from the
Diffusers library
pip install -U diffusers transformers accelerate
import torch
from diffusers import DiffusionPipeline

# switch to "mps" for apple devices
pipe = DiffusionPipeline.from_pretrained("BiliSakura/MiniT2I-diffusers", dtype=torch.bfloat16, device_map="cuda")

prompt = "A lonely astronaut standing on a quiet beach under two moons."
image = pipe(prompt).images[0]

BiliSakura/MiniT2I-diffusers

Self-contained MiniT2I text-to-image checkpoints for Hugging Face diffusers. Each variant folder ships its own pipeline code, component modules, bundled FLAN-T5-Large text encoder, and transformer weights.

Converted from MiniT2I/MiniT2I using MiniT2I-diffusers in Visual-Generative-Foundation-Model-Collection.

Available checkpoints

Subfolder Model Params (denoiser + text encoder) Patch Recommended CFG
MiniT2I-B-16/ MiniT2I-B/16 258M + 341M 16 2.5
MiniT2I-L-16/ MiniT2I-L/16 912M + 341M 16 6.0

Repo layout

BiliSakura/MiniT2I-diffusers/
β”œβ”€β”€ README.md
β”œβ”€β”€ MiniT2I-B-16/
β”‚   β”œβ”€β”€ pipeline.py
β”‚   β”œβ”€β”€ model_index.json
β”‚   β”œβ”€β”€ conversion_metadata.json
β”‚   β”œβ”€β”€ demo.png
β”‚   β”œβ”€β”€ scheduler/
β”‚   β”‚   └── scheduler_config.json
β”‚   β”œβ”€β”€ text_encoder/
β”‚   β”œβ”€β”€ tokenizer/
β”‚   └── transformer/
β”‚       β”œβ”€β”€ config.json
β”‚       β”œβ”€β”€ diffusion_pytorch_model.safetensors
β”‚       └── transformer_minit2i.py
└── MiniT2I-L-16/
    └── ...

Each variant is self-contained: load with custom_pipeline=.../pipeline.py and trust_remote_code=True. MiniT2I denoises directly in RGB pixel space (no VAE).

Demo

MiniT2I-B-16 demo

Prompt: "A lonely astronaut standing on a quiet beach under two moons." β€” MiniT2I-B/16 at 512Γ—512, 100 steps, guidance_scale=2.5, seed 42.

Load from Hugging Face

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained(
    "BiliSakura/MiniT2I-diffusers/MiniT2I-B-16",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    "A lonely astronaut standing on a quiet beach under two moons.",
    num_inference_steps=100,
    guidance_scale=2.5,
    generator=generator,
).images[0]
image.save("demo.png")

For MiniT2I-L/16, use MiniT2I-L-16 and guidance_scale=6.0.

Load from a local clone

from pathlib import Path
import torch
from diffusers import DiffusionPipeline

model_dir = Path("./MiniT2I-B-16").resolve()
pipe = DiffusionPipeline.from_pretrained(
    str(model_dir),
    local_files_only=True,
    custom_pipeline=str(model_dir / "pipeline.py"),
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
).to("cuda")

generator = torch.Generator(device="cuda").manual_seed(42)
image = pipe(
    "A lonely astronaut standing on a quiet beach under two moons.",
    num_inference_steps=100,
    guidance_scale=2.5,
    generator=generator,
).images[0]
image.save("demo.png")

Load a variant subfolder (e.g. ./MiniT2I-B-16), not the repo root.

Recommended inference settings

Variant Resolution Steps CFG scale torch_dtype
MiniT2I-B-16 512Γ—512 100 2.5 bfloat16
MiniT2I-L-16 512Γ—512 100 6.0 bfloat16

For GenEval / DPG-Bench evaluation, upstream configs use guidance_scale=5.0 for both B/16 and L/16.

Interface notes

  • Text conditioning uses bundled google/flan-t5-large (T5EncoderModel + T5Tokenizer).
  • Scheduler is FlowMatchEulerDiscreteScheduler with 1000 training timesteps and shift=1.0.
  • guidance_scale > 1.0 enables classifier-free guidance with an empty-string null prompt.
  • Output resolution is fixed at 512Γ—512 for these exports.

Regenerate bundles

From the repository root:

conda activate rsgen
python scripts/convert_minit2i_to_bilisakura.py

Links

Citation

@misc{minit2i2026,
  title  = {MiniT2I: A Minimalist Baseline for Text-to-Image Generation},
  author = {Wang, Xianbang and Zhao, Hanhong and Lu, Yiyang and Zhou, Kangyang and Ma, Linrui and He, Kaiming},
  year   = {2026},
  url    = {https://peppaking8.github.io/#/post/minit2i}
}
Downloads last month
-
Inference Examples
Examples
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Collection including BiliSakura/MiniT2I-diffusers