@VictorSanh on Hugging Face: "Can't wait to see multimodal LLama 3! We released a resource that might come…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

VictorSanh

posted an update Apr 19, 2024

Post

2541

Can't wait to see multimodal LLama 3!

We released a resource that might come in handy: The Cauldron 🍯

The Cauldron is a massive manually-curated collection of 50 vision-language sets for instruction fine-tuning. 3.6M images, 30.3M query/answer pairs.

It covers a large variety of downstream uses: visual question answering on natural images, OCR, document/charts/figures/tables understanding, textbooks/academic question, reasoning, captioning, spotting differences between 2 images, and screenshot-to-code.

HuggingFaceM4/the_cauldron

Nitral-AI

Apr 22, 2024

•

edited Apr 22, 2024

weizhiwang/LLaVA-Llama-3-8B

First llava 1.5 llama 3 pretrain, managed to make a projector file out of it that works with any llama 3 8b model. (this can be used with any backend that supports llava mmproject.)

ChaoticNeutrals/Llava_1.5_Llama3_mmproj

In this post