metadata

library_name: transformers
license: apache-2.0
datasets:
  - HuggingFaceM4/the_cauldron
  - HuggingFaceM4/Docmatix
  - lmms-lab/LLaVA-OneVision-Data
  - lmms-lab/M4-Instruct-Data
  - HuggingFaceFV/finevideo
  - MAmmoTH-VL/MAmmoTH-VL-Instruct-12M
  - lmms-lab/LLaVA-Video-178K
  - orrzohar/Video-STaR
  - Mutonix/Vript
  - TIGER-Lab/VISTA-400K
  - Enxin/MovieChat-1K_train
  - ShareGPT4Video/ShareGPT4Video
pipeline_tag: video-text-to-text
language:
  - en
base_model:
  - HuggingFaceTB/SmolVLM-Instruct
tags:
  - mlx

smdesai/SmolVLM2-2.2B-Instruct-4bit

This model was converted to MLX format from HuggingFaceTB/SmolVLM2-2.2B-Instruct using mlx-vlm version 0.1.14. Refer to the original model card for more details on the model.

Use with mlx

pip install -U mlx-vlm

python -m mlx_vlm.generate --model smdesai/SmolVLM2-2.2B-Instruct-4bit --max-tokens 100 --temp 0.0 --prompt "Describe this image." --image <path_to_image>