Unofficial port of the VQ model in the Latent Diffusion Model (LDM)

This is an unofficial port of the VQ model in the Latent Diffusion Model (LDM).

Model Details

See models/first_stage_models/vq-f16 in the original repository.

How to use

python -m venv .venv && source .venv/bin/activate
pip install huggingface-hub
huggingface-cli download ktrk115/ldm-vq-f16 requirements.txt | xargs cat > requirements.txt
pip install -r requirements.txt

import torch
from PIL import Image
from transformers import AutoImageProcessor, AutoModel

image_processor = AutoImageProcessor.from_pretrained("ktrk115/ldm-vq-f16", trust_remote_code=True)
model = AutoModel.from_pretrained("ktrk115/ldm-vq-f16", trust_remote_code=True)

# Image reconstruction
img = Image.open("path/to/image.png")
example = image_processor(img)
with torch.inference_mode():
    recon, _ = model.model(example["image"].unsqueeze(0))
recon_img = image_processor.postprocess(recon[0])
recon_img.save("recon.png")