Edit model card

Model Summery

MobileVLM V2 is a family of significantly improved vision language models upon MobileVLM, which proves that a delicate orchestration of novel architectural design, an improved training scheme tailored for mobile VLMs, and rich high-quality dataset curation can substantially benefit VLMs’ performance. Specifically, MobileVLM V2 1.7B achieves better or on-par performance on standard VLM benchmarks compared with much larger VLMs at the 3B scale. Notably, MobileVLM_V2-3B model outperforms a large variety of VLMs at the 7B+ scale.

The MobileVLM_V2-1.7B was built on our MobileLLaMA-1.4B-Chat to facilitate the off-the-shelf deployment.

Model Sources

How to Get Started with the Model

Inference examples can be found at Github.

Downloads last month
252
GGUF
Model size
297M params
Architecture
clip
Unable to determine this model's library. Check the docs .