README.md · xtuner/llava-phi-3-mini-xtuner at da3a28dc746aa1d6736bc2dd3aa7345e11efac4b

metadata

datasets:
  - Lin-Chen/ShareGPT4V
pipeline_tag: image-text-to-text
library_name: xtuner

Model

llava-phi-3-mini is a LLaVA model fine-tuned from microsoft/Phi-3-mini-4k-instruct and CLIP-ViT-Large-patch14-336 with ShareGPT4V-PT and InternVL-SFT by XTuner.

Note: This model is in xtuner LLaVA format. The model in official LLaVA format and HuggingFace LLaVA format can be found on xtuner/llava-phi-3-mini and xtuner/llava-phi-3-mini-hf.

Details

Model	Visual Encoder	Projector	Resolution	Pretraining Strategy	Fine-tuning Strategy	Pretrain Dataset	Fine-tune Dataset
LLaVA-v1.5-7B	CLIP-L	MLP	336	Frozen LLM, Frozen ViT	Full LLM, Frozen ViT	LLaVA-PT (558K)	LLaVA-Mix (665K)
LLaVA-Llama-3-8B	CLIP-L	MLP	336	Frozen LLM, Frozen ViT	Full LLM, LoRA ViT	LLaVA-PT (558K)	LLaVA-Mix (665K)
LLaVA-Llama-3-8B-v1.1	CLIP-L	MLP	336	Frozen LLM, Frozen ViT	Full LLM, LoRA ViT	ShareGPT4V-PT (1246K)	InternVL-SFT (1268K)
LLaVA-Phi-3-mini	CLIP-L	MLP	336	Frozen LLM, Frozen ViT	Full LLM, Full ViT	ShareGPT4V-PT (1246K)	InternVL-SFT (1268K)

Results

Quickstart

Installation

pip install 'git+https://github.com/InternLM/xtuner.git#egg=xtuner[deepspeed]'

Chat

xtuner chat xtuner/llava-phi-3-mini-xtuner \
  --llava xtuner/llava-phi-3-mini-xtuner \
  --prompt-template phi3_chat \
  --image $IMAGE_PATH

MMBench Evaluation

XTuner integrates the MMBench evaluation, and you can perform evaluations with the following command!

xtuner mmbench xtuner/llava-phi-3-mini-xtuner \
  --llava xtuner/llava-phi-3-mini-xtuner \
  --prompt-template phi3_chat \
  --data-path $MMBENCH_DATA_PATH \
  --work-dir $RESULT_PATH

After the evaluation is completed, if it's a development set, it will directly print out the results; If it's a test set, you need to submit mmbench_result.xlsx to the official MMBench for final evaluation to obtain precision results!

Training

Pretrain

NPROC_PER_NODE=8 xtuner train llava_phi3_mini_4k_instruct_clip_vit_large_p14_336_e1_gpu8_sharegpt4v_pretrain --deepspeed deepspeed_zero2 --seed 1024

Fine-tune

NPROC_PER_NODE=8 xtuner train llava_phi3_mini_4k_instruct_full_clip_vit_large_p14_336_full_e2_gpu8_internvl_finetune --deepspeed deepspeed_zero2 --seed 1024

Citation

@misc{2023xtuner,
    title={XTuner: A Toolkit for Efficiently Fine-tuning LLM},
    author={XTuner Contributors},
    howpublished = {\url{https://github.com/InternLM/xtuner}},
    year={2023}
}