PRXPixel (text-to-image, pixel space)

PRXPixel is a pixel-space variant of PRX: it denoises raw RGB directly (no VAE), conditions on a Qwen3-VL text encoder (rather than T5Gemma), and feeds the generation resolution into the timestep modulation. The denoiser is a ~7B PRXTransformer2DModel with a bottleneck patch projection and a resolution embedder.

  • Resolution: 1024
  • Transformer: ~7B params, torch.bfloat16
  • Text encoder: Qwen3-VL text tower (Qwen3VLTextModel)
  • VAE: none (pixel space)
  • Scheduler: FlowMatchEulerDiscreteScheduler

Requirements

PRXPixelPipeline is not yet in a released diffusers. Install diffusers from the branch that adds it, and use transformers >= 4.57 (the version that introduced Qwen3VLTextModel):

pip install "transformers>=4.57"
pip install "git+https://github.com/huggingface/diffusers.git@prx-pixel-pipeline"

Usage

import torch
from diffusers import PRXPixelPipeline

pipe = PRXPixelPipeline.from_pretrained("Photoroom/prxpixel-t2i", torch_dtype=torch.bfloat16)
pipe.to("cuda")

prompt = "A front-facing portrait of a lion in the golden savanna at sunset."
image = pipe(prompt, num_inference_steps=28, guidance_scale=5.0).images[0]
image.save("prxpixel_output.png")

License

Released under the Apache 2.0 license. See LICENSE and NOTICE.

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Space using Photoroom/prxpixel-t2i 1

Collection including Photoroom/prxpixel-t2i