Image-Text-to-Text
MLX
Safetensors
English
idefics2
multimodal
vision