mm-datasets SigLIP R6-C Tokenizer Checkpoint
This repository stores the R6-C SigLIP feature tokenizer checkpoint used by
mm-datasets.
Checkpoint
- File:
siglip-r6c-50k-lr3e5.ckpt - Expected tokenizer output:
196integer tokens per RGB image - Token grid:
14 x 14 - Token range:
[0, 16806] - Input preprocessing in
mm-datasets: RGB image resized to224 x 224
The tokenizer code and bundled YAML config live in mm-datasets under:
mm_datasets/tokenizers/siglip/
Usage
Set:
export SIGLIP_TOKENIZER_HF_REPO_ID="kunalpratap10/mm-datasets-siglip-r6c-tokenizer"
export SIGLIP_TOKENIZER_HF_FILENAME="siglip-r6c-50k-lr3e5.ckpt"
SigLIPTokenizer first uses a local checkpoint if available. If no local
checkpoint is found, it downloads this checkpoint from Hugging Face Hub.
Provenance
This checkpoint corresponds to the R6-C run:
2026-06-01T12-11-33_r6c_50k_lr3e5
Original RCP checkpoint path:
/scratch/kunal/mm_dataset_proj/siglip_tokenizer/video-modeling-abstract/my_VidTok/logs/2026-06-01T12-11-33_r6c_50k_lr3e5/checkpoints/last.ckpt
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support