Text-to-Image
Diffusers
Safetensors
Edit model card

Lumina-Next-SFT

The Lumina-Next-SFT is a Next-DiT model containing 2B parameters and utilizes Gemma-2B as the text encoder, enhanced through high-quality supervised fine-tuning (SFT).

Our generative model has Next-DiT as the backbone, the text encoder is the Gemma 2B model, and the VAE uses a version of sdxl fine-tuned by stabilityai.

Lumina-Next Lumina-T2X paper

hero

๐Ÿ“ฐ News

  • [2024-07-08] ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Lumina-Next is now supported in the diffusers! Thanks to @yiyixuxu and @sayakpaul!

  • [2024-06-08] ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ We have released the Lumina-Next-SFT model.

  • [2024-05-28] We updated the Lumina-Next-T2I model to support 2K Resolution image generation.

  • [2024-05-16] We have converted the .pth weights to .safetensors weights. Please pull the latest code to use demo.py for inference.

  • [2024-05-12] We release the next version of Lumina-T2I, called Lumina-Next-T2I for faster and lower memory usage image generation model.

๐ŸŽฎ Model Zoo

More checkpoints of our model will be released soon~

Resolution Next-DiT Parameter Text Encoder Prediction Download URL
1024 2B Gemma-2B Rectified Flow hugging face

Installation

1. Create a conda environment and install PyTorch

Note: You may want to adjust the CUDA version according to your driver version.

conda create -n Lumina_T2X -y
conda activate Lumina_T2X
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y

2. Install dependencies

pip install diffusers huggingface_hub

3. Install flash-attn

pip install flash-attn --no-build-isolation

Inference

  1. Prepare the pre-trained model

โญโญ (Recommended) you can use huggingface_cli to download our model:

huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-SFT-diffusers --local-dir /path/to/ckpt
  1. Run with demo code:
from diffusers import LuminaText2ImgPipeline
import torch

pipeline = LuminaText2ImgPipeline.from_pretrained("/path/to/ckpt/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")

# or you can download the model using code directly
# pipeline = LuminaText2ImgPipeline.from_pretrained("Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")

image = pipeline(prompt="Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. "
                        "Background shows an industrial revolution cityscape with smoky skies and tall, metal structures").images[0]
Downloads last month
457
Inference API
This model can be loaded on Inference API (serverless).

Dataset used to train Alpha-VLLM/Lumina-Next-SFT-diffusers

Collection including Alpha-VLLM/Lumina-Next-SFT-diffusers