FlexTok Tokenizers & VAEs
Collection
Flexible 1D tokenizers and VAEs from https://flextok.epfl.ch/
•
17 items
•
Updated
Website
| arXiv
| GitHub
| 🤗 Demo
| BibTeX
Official implementation and pre-trained models for:
FlexTok: Resampling Images into 1D Token Sequences of Flexible Length, arXiv 2025
Roman Bachmann*, Jesse Allardice*, David Mizrahi*, Enrico Fini, Oğuzhan Fatih Kar, Elmira Amirloo, Alaaeldin El-Nouby, Amir Zamir, Afshin Dehghan
For install instructions, please see https://github.com/apple/ml-flextok.
To load the 4-channel VAE-GAN directly from HuggingFace Hub and autoencode a sample image, call:
from diffusers.models import AutoencoderKL
from flextok.utils.demo import imgs_from_urls
vae = AutoencoderKL.from_pretrained(
'EPFL-VILAB/flextok_vae_c4', low_cpu_mem_usage=False
).eval()
# Load example images of shape (B, 3, H, W), normalized to [-1,1]
imgs = imgs_from_urls(urls=['https://storage.googleapis.com/flextok_site/nb_demo_images/0.png'])
# Autoencode with the VAE
latents = vae.encode(imgs).latent_dist.sample() # Shape (B, 4, H//8, W//8)
reconst = vae.decode(latents).sample # Shape (B, 3, H, W)
If you find this repository helpful, please consider citing our work:
@article{flextok,
title={{FlexTok}: Resampling Images into 1D Token Sequences of Flexible Length},
author={Roman Bachmann and Jesse Allardice and David Mizrahi and Enrico Fini and O{\u{g}}uzhan Fatih Kar and Elmira Amirloo and Alaaeldin El-Nouby and Amir Zamir and Afshin Dehghan},
journal={arXiv 2025},
year={2025},
}
The model weights in this repository are released under the Apple Model License for Research.