Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
VictorSanh 
posted an update 14 days ago
Post
2441
Can't wait to see multimodal LLama 3!

We released a resource that might come in handy: The Cauldron 🍯

The Cauldron is a massive manually-curated collection of 50 vision-language sets for instruction fine-tuning. 3.6M images, 30.3M query/answer pairs.

It covers a large variety of downstream uses: visual question answering on natural images, OCR, document/charts/figures/tables understanding, textbooks/academic question, reasoning, captioning, spotting differences between 2 images, and screenshot-to-code.

HuggingFaceM4/the_cauldron

weizhiwang/LLaVA-Llama-3-8B

First llava 1.5 llama 3 pretrain, managed to make a projector file out of it that works with any llama 3 8b model. (this can be used with any backend that supports llava mmproject.)

ChaoticNeutrals/Llava_1.5_Llama3_mmproj

In this post