CLIMP-Mamba2

Contrastive Language-Image Mamba Pretraining (CLIMP) using Mamba2-1.3B as text encoder.

Model Description

Component Details
Vision Encoder VMamba-Base (128-256-512-1024 dims, depths [2,2,15,2])
Text Encoder Mamba2-1.3B (AntonV/mamba2-1.3b-hf)
Projection Dim 768
Training Data CC12M
Image Resolution 224x224
Loss Symmetric InfoNCE (learned temperature)

Usage

from models import load_climp
from data.utils import transform_image

model = load_climp("mamba2")
transform = transform_image(224)

See the demo repository for evaluation code.

Paper

CLIMP: Contrastive Language-Image Mamba Pretraining

@article{climp2026,
  title={CLIMP: Contrastive Language-Image Mamba Pretraining},
  author={Shabtay, Nimrod and Zimerman, Itamar and Schwartz, Eli and Giryes, Raja},
  journal={arXiv preprint arXiv:2601.06891},
  year={2026}
}
Downloads last month
9
Safetensors
Model size
1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train NimrodShabtay1986/CLIMP-Mamba2

Paper for NimrodShabtay1986/CLIMP-Mamba2