--- language: - en license: mit library_name: transformers datasets: - liuhaotian/LLaVA-Instruct-150K - liuhaotian/LLaVA-Pretrain --- # Model Card for LLaVa-Phi-2-3B ## Model Details ### Model Description - **Developed by:** [LAION](https://laion.ai/), [SkunkworksAI](https://huggingface.co/SkunkworksAI) & [Ontocord](https://www.ontocord.ai/) - **Model type:** LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture - **Finetuned from model:** [Phi-2](https://huggingface.co/microsoft/phi-2) - **License:** MIT - **Demo:** [llava-phi-2-3b-demo](https://huggingface.co/spaces/marianna13/llava-phi-2-3b-demo) ### Model Sources - **Repository:** [BakLLaVa](https://github.com/SkunkworksAI/BakLLaVA) ## Evaluation ### Benchmarks | Model | Parameters |SQA | GQA | TextVQA | POPE | | --- | --- | --- | --- | --- | --- | | [LLaVA-1.5](https://huggingface.co/liuhaotian/llava-v1.5-7b) | 7.3B | 68.0| **62.0** | **58.3** | 85.3 | | [MC-LLaVA-3B](https://huggingface.co/visheratin/MC-LLaVA-3b) | 3B | - | 49.6 | 38.59 | - | | [LLaVA-Phi](https://arxiv.org/pdf/2401.02330.pdf) | 3B | 68.4 | - | 48.6 | 85.0 | | [moondream1](https://huggingface.co/vikhyatk/moondream1) | 1.6B | - | 56.3 | 39.8 | - | | **llava-phi-2-3b** | 3B | **69.0** | 51.2 | 47.0 | **86.0** | ### Image Captioning (MS COCO) | Model | BLEU_1 | BLEU_2 | BLEU_3 | BLEU_4 | METEOR | ROUGE_L | CIDEr | SPICE | | -------------------------------------------------------- | ------ | ------ | ------ | ------ | ------ | ------- | ----- | ----- | | llava-1.5-7b | 75.8 | 59.8 | 45 | 33.3 | 29.4 | 57.7 | 108.8 | 23.5 | | **llava-phi-2-3b** | 67.7 | 50.5 | 35.7 | 24.2 | 27.0 | 52.4 | 85.0 | 20.7 |