Text-to-Image
Diffusers
Safetensors

Lumina-Next-SFT

The Lumina-Next-SFT is a Next-DiT model containing 2B parameters and utilizes Gemma-2B as the text encoder, enhanced through high-quality supervised fine-tuning (SFT).

Our generative model has Next-DiT as the backbone, the text encoder is the Gemma 2B model, and the VAE uses a version of sdxl fine-tuned by stabilityai.

Lumina-Next Lumina-T2X paper

hero

๐Ÿ“ฐ News

  • [2024-07-08] ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ Lumina-Next is now supported in the diffusers! Thanks to @yiyixuxu and @sayakpaul!

  • [2024-06-08] ๐ŸŽ‰๐ŸŽ‰๐ŸŽ‰ We have released the Lumina-Next-SFT model.

  • [2024-05-28] We updated the Lumina-Next-T2I model to support 2K Resolution image generation.

  • [2024-05-16] We have converted the .pth weights to .safetensors weights. Please pull the latest code to use demo.py for inference.

  • [2024-05-12] We release the next version of Lumina-T2I, called Lumina-Next-T2I for faster and lower memory usage image generation model.

๐ŸŽฎ Model Zoo

More checkpoints of our model will be released soon~

Resolution Next-DiT Parameter Text Encoder Prediction Download URL
1024 2B Gemma-2B Rectified Flow hugging face

Installation

1. Create a conda environment and install PyTorch

Note: You may want to adjust the CUDA version according to your driver version.

conda create -n Lumina_T2X -y
conda activate Lumina_T2X
conda install python=3.11 pytorch==2.1.0 torchvision==0.16.0 torchaudio==2.1.0 pytorch-cuda=12.1 -c pytorch -c nvidia -y

2. Install dependencies

pip install diffusers huggingface_hub

3. Install flash-attn

pip install flash-attn --no-build-isolation

Inference

  1. Prepare the pre-trained model

โญโญ (Recommended) you can use huggingface_cli to download our model:

huggingface-cli download --resume-download Alpha-VLLM/Lumina-Next-SFT-diffusers --local-dir /path/to/ckpt
  1. Run with demo code:
from diffusers import LuminaText2ImgPipeline
import torch

pipeline = LuminaText2ImgPipeline.from_pretrained("/path/to/ckpt/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")

# or you can download the model using code directly
# pipeline = LuminaText2ImgPipeline.from_pretrained("Alpha-VLLM/Lumina-Next-SFT-diffusers", torch_dtype=torch.bfloat16).to("cuda")

image = pipeline(prompt="Upper body of a young woman in a Victorian-era outfit with brass goggles and leather straps. "
                        "Background shows an industrial revolution cityscape with smoky skies and tall, metal structures").images[0]
Downloads last month
36,211
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Alpha-VLLM/Lumina-Next-SFT-diffusers

Collection including Alpha-VLLM/Lumina-Next-SFT-diffusers