Model Card for Model ID

This is a multimodal implementation of Phi2 model inspired by LlaVA-Phi.

Model Details

  1. LLM Backbone: Phi2
  2. Vision Tower: clip-vit-large-patch14-336
  3. Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples)
  4. Finetuning Dataset: Instruct 150k dataset based on COCO
  5. Finetuned Model: RaviNaik/Llava-Phi2

Model Sources

How to Get Started with the Model

Use the code below to get started with the model.

  1. Clone this repository and navigate to llava-phi folder
git clone https://github.com/zhuyiche/llava-phi.git
cd llava-phi
  1. Install Package
conda create -n llava_phi python=3.10 -y
conda activate llava_phi
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Run the Model
python llava_phi/eval/run_llava_phi.py --model-path="RaviNaik/Llava-Phi2" \
    --image-file="https://huggingface.co/RaviNaik/Llava-Phi2/resolve/main/people.jpg?download=true" \
    --query="How many people are there in the image?"

Acknowledgement

This implementation is based on wonderful work done by:
LlaVA-Phi
Llava
Phi2

Downloads last month
38
Safetensors
Model size
3.09B params
Tensor type
F32
Β·
Inference Providers NEW
This model is not currently available via any of the supported third-party Inference Providers, and the HF Inference API does not support transformers models with pipeline type image-text-to-text

Datasets used to train RaviNaik/Llava-Phi2

Spaces using RaviNaik/Llava-Phi2 4