Victor Sanh PRO

VictorSanh

AI & ML interests

None yet

Articles

Organizations

Posts 6

view post
Post
312
Glad to see Idefics2 making its way into the awesome OpenVLM Leaderboard which ranks VLMs. 🏆
2nd in its category (<10B parameters and open weights)!

While InternLM-XComposer2 uses proprietary data, Idefics2 is built solely using openly available data.

Leaderboard: opencompass/open_vlm_leaderboard
Model: HuggingFaceM4/idefics2-8b
view post
Post
2372
Can't wait to see multimodal LLama 3!

We released a resource that might come in handy: The Cauldron 🍯

The Cauldron is a massive manually-curated collection of 50 vision-language sets for instruction fine-tuning. 3.6M images, 30.3M query/answer pairs.

It covers a large variety of downstream uses: visual question answering on natural images, OCR, document/charts/figures/tables understanding, textbooks/academic question, reasoning, captioning, spotting differences between 2 images, and screenshot-to-code.

HuggingFaceM4/the_cauldron