Text Generation
Transformers
Safetensors
English
phi-llava
custom_code
Inference Endpoints
llava-phi-2-3b / README.md
marianna13's picture
Update README.md
1929e5e verified
metadata
language:
  - en
license: mit
library_name: transformers
datasets:
  - liuhaotian/LLaVA-Instruct-150K
  - liuhaotian/LLaVA-Pretrain

Model Card for LLaVa-Phi-2-3B

Model Details

Model Description

  • Developed by: LAION, SkunkworksAI & Ontocord
  • Model type: LLaVA is an open-source chatbot trained by fine-tuning Phi-2 on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture
  • Finetuned from model: Phi-2
  • License: MIT
  • Demo: llava-phi-2-3b-demo

Model Sources

Evaluation

Benchmarks

Model Parameters SQA GQA TextVQA POPE
LLaVA-1.5 7.3B 68.0 62.0 58.3 85.3
MC-LLaVA-3B 3B - 49.6 38.59 -
LLaVA-Phi 3B 68.4 - 48.6 85.0
moondream1 1.6B - 56.3 39.8 -
llava-phi-2-3b 3B 69.0 51.2 47.0 86.0

Image Captioning (MS COCO)

Model BLEU_1 BLEU_2 BLEU_3 BLEU_4 METEOR ROUGE_L CIDEr SPICE
llava-1.5-7b 75.8 59.8 45 33.3 29.4 57.7 108.8 23.5
llava-phi-2-3b 67.7 50.5 35.7 24.2 27.0 52.4 85.0 20.7