--- license: apache-2.0 --- **Base Model**: BLIP2-t5 pretrained version **Finetune data**: LLAVA 150k (sample one pair of instruction-answer if multi-round conversations) **Hyper-parameters**: * lr = 2e-5 * gbs = 32 * image size = 480 * weight decay = 0.05