L2P: Unlocking Latent Potential for Pixel Generation (INT8 Quantized)

📦 Model Overview

This repository contains a highly optimized, INT8 quantized version of the model-1k-merge from the L2P (Latent-to-Pixel) framework.

It has been specifically repackaged and compressed for ComfyUI users who want the native 4K capabilities of L2P without the massive 19.6 GB VRAM and storage footprint of the original 16-bit model.

🔬 Quantization Details

This is a "healthy" mixed-precision quantization that carefully balances VRAM reduction with output fidelity:

Size Reduction: Reduced from 19.6 GB to 7.19 GB (~63% smaller).
Mixed Precision: The heaviest matrix layers (like qkv and feed-forward networks) are quantized to INT8 with an F32 scaling factor. Highly sensitive layers—including layer norms, biases, and the entire local_decoder—remain in BF16 to prevent color banding and maintain pristine image quality.
ComfyUI Ready: The state dict keys have been prefixed with model.diffusion_model. and the Attention Q/K/V tensors have been packed into a single matrix for seamless, drop-in compatibility with ComfyUI.

🚀 How to Use (ComfyUI)

Download the model-1k-merge-INT8.safetensors file.
Place it in your ComfyUI models directory: ComfyUI/models/checkpoints/ (or your designated diffusion model folder).
Load it using the standard Load Checkpoint node in ComfyUI.
Because the model bypasses the traditional VAE memory bottlenecks, you can natively generate at massive resolutions (up to 4K) directly in pixel space.

📖 About the Original L2P Framework

An efficient transfer paradigm enabling high-quality, end-to-end pixel-space diffusion with minimal computational overhead and data requirements.

Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models.

Key Innovations:

No VAE Bottleneck: L2P discards the VAE in favor of large-patch tokenization, unlocking native 4K ultra-high resolution generation.
Efficient Transfer: Freezes the source LDM's intermediate layers, exclusively training shallow layers to learn the latent-to-pixel transformation.
Zero Real-Data Collection: Utilizes LDM-generated synthetic images as the sole training corpus. L2P fits an already smooth data manifold, enabling rapid convergence.
Accessible Scaling: This strategy allows L2P to seamlessly migrate massive latent priors to the pixel space using only 8 GPUs.

Extensive experiments across mainstream LDM architectures show that L2P incurs negligible training overhead, yet performs on par with the source LDM on DPG-Bench and reaches 93% performance on GenEval.

📜 Citation

If you use this model in your research or projects, please credit the original L2P authors:

@article{l2p2026,
  title={L2P: Unlocking Latent Potential for Pixel Generation},
  author={Original L2P Authors},
  journal={arXiv preprint arXiv:2605.12013},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for Abiray/L2P-model-1k-merge-INT8

Base model

zhen-nan/L2P

Quantized

(2)

this model

Paper for Abiray/L2P-model-1k-merge-INT8

L2P: Unlocking Latent Potential for Pixel Generation

Paper • 2605.12013 • Published 17 days ago • 36