CLIMP-Mamba2

Contrastive Language-Image Mamba Pretraining (CLIMP) using Mamba2-1.3B as text encoder.

Model Description

Component	Details
Vision Encoder	VMamba-Base (128-256-512-1024 dims, depths [2,2,15,2])
Text Encoder	Mamba2-1.3B (AntonV/mamba2-1.3b-hf)
Projection Dim	768
Training Data	CC12M
Image Resolution	224x224
Loss	Symmetric InfoNCE (learned temperature)

Usage

from models import load_climp
from data.utils import transform_image

model = load_climp("mamba2")
transform = transform_image(224)

See the demo repository for evaluation code.

Paper

CLIMP: Contrastive Language-Image Mamba Pretraining

@article{climp2026,
  title={CLIMP: Contrastive Language-Image Mamba Pretraining},
  author={Shabtay, Nimrod and Zimerman, Itamar and Schwartz, Eli and Giryes, Raja},
  journal={arXiv preprint arXiv:2601.06891},
  year={2026}
}

Downloads last month: 9

Safetensors

Model size

1B params

Tensor type

F32

Dataset used to train NimrodShabtay1986/CLIMP-Mamba2

Paper for NimrodShabtay1986/CLIMP-Mamba2

CLIMP: Contrastive Language-Image Mamba Pretraining

Paper • 2601.06891 • Published Jan 11 • 3