Image-Text-to-Text
Transformers
Safetensors
English
idefics2
pretraining
multimodal
vision
Inference Endpoints
5 papers

Add idefics2-8b for HuggingChat

#53
by wangdafa - opened

HuggingChat doesn't have a multimodal model yet

We are planning to do that if we make scaled versions of the model. Right now, at the 8B scale, even the best models are a bit too immature and often hallucinate.

@HugoLaurencon I feel it would be nice to have a smaller model, for example base from phi-3. I'm trying it but maybe the function convert_idefics2_weights_to_hf doesn't work?

HuggingFaceM4 org

It must work for a llama/mistral architecture, but if there are changes with phi-3 you might need to adapt the script

Sign up or log in to comment