π¦ Model Overview
This repository contains a highly optimized, INT8 quantized version of the model-1k-merge from the L2P (Latent-to-Pixel) framework.
It has been specifically repackaged and compressed for ComfyUI users who want the native 4K capabilities of L2P without the massive 19.6 GB VRAM and storage footprint of the original 16-bit model.
π¬ Quantization Details
This is a "healthy" mixed-precision quantization that carefully balances VRAM reduction with output fidelity:
- Size Reduction: Reduced from 19.6 GB to 7.19 GB (~63% smaller).
- Mixed Precision: The heaviest matrix layers (like
qkvand feed-forward networks) are quantized toINT8with anF32scaling factor. Highly sensitive layersβincluding layer norms, biases, and the entirelocal_decoderβremain inBF16to prevent color banding and maintain pristine image quality. - ComfyUI Ready: The state dict keys have been prefixed with
model.diffusion_model.and the Attention Q/K/V tensors have been packed into a single matrix for seamless, drop-in compatibility with ComfyUI.
π How to Use (ComfyUI)
- Download the
model-1k-merge-INT8.safetensorsfile. - Place it in your ComfyUI models directory:
ComfyUI/models/checkpoints/(or your designated diffusion model folder). - Load it using the standard
Load Checkpointnode in ComfyUI. - Because the model bypasses the traditional VAE memory bottlenecks, you can natively generate at massive resolutions (up to 4K) directly in pixel space.
π About the Original L2P Framework
An efficient transfer paradigm enabling high-quality, end-to-end pixel-space diffusion with minimal computational overhead and data requirements.
Pixel diffusion models have recently regained attention for visual generation. However, training advanced pixel-space models from scratch demands prohibitive computational and data resources. To address this, we propose the Latent-to-Pixel (L2P) transfer paradigm, an efficient framework that directly harnesses the rich knowledge of pre-trained LDMs to build powerful pixel-space models.
Key Innovations:
- No VAE Bottleneck: L2P discards the VAE in favor of large-patch tokenization, unlocking native 4K ultra-high resolution generation.
- Efficient Transfer: Freezes the source LDM's intermediate layers, exclusively training shallow layers to learn the latent-to-pixel transformation.
- Zero Real-Data Collection: Utilizes LDM-generated synthetic images as the sole training corpus. L2P fits an already smooth data manifold, enabling rapid convergence.
- Accessible Scaling: This strategy allows L2P to seamlessly migrate massive latent priors to the pixel space using only 8 GPUs.
Extensive experiments across mainstream LDM architectures show that L2P incurs negligible training overhead, yet performs on par with the source LDM on DPG-Bench and reaches 93% performance on GenEval.
π Citation
If you use this model in your research or projects, please credit the original L2P authors:
@article{l2p2026,
title={L2P: Unlocking Latent Potential for Pixel Generation},
author={Original L2P Authors},
journal={arXiv preprint arXiv:2605.12013},
year={2026}
}
Model tree for Abiray/L2P-model-1k-merge-INT8
Base model
zhen-nan/L2P