---
license: apache-2.0
tags:
- llava
pipeline_tag: image-text-to-text
---
**Base Model**: BLIP2-t5 pretrained version

**Finetune data**: 
* LLAVA 150k (sample one pair of instruction-answer if multi-round conversations)
* MiniGPT4 3500 pairs

**Hyper-parameters**: 

* BLIP2-flant5-xl + LLAVA (initial commits)
  * **v0**:
  * lr = 2e-5 --> 0.0 with cosine lr scheduler
  * gbs = 32
  * image size = 480
  * weight decay = 0.05

  * **v1 (same as LLAVA)**:
  * lr = 2e-5
  * gbs = 32
  * image size = 224
  * weight decay = 0.0

* Others
  * lr = 2e-5
  * gbs = 32
  * image size = 224
  * weight decay = 0.0