SOLO Model Card

Model details

Model type: SOLO is a 7B large vision-language model with a single Transformer architecture for unified vision-language modeling. SOLO accepts both raw image patches (in pixels) and texts as inputs, without using a separate pre-trained vision encoder.

Model date: SOLO-7B was trained in June 2024.

Paper or resources for more information: Paper & Github

Where to send questions or comments about the model: https://github.com/Yangyi-Chen/SOLO/issues

Inference with Huggingface Please check this scripts for an example of performing inference on the model.

Downloads last month
32
Safetensors
Model size
7.26B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support