This model is the stage 2 checkpoint of one of the thirteen settings, CLIP+DINOv2@336, used in the Law of Vision Representation in MLLMs.

Safetensors

Model size

7.37B params

Tensor type

BF16

Inference API

Inference API (serverless) does not yet support transformers models for this pipeline type.