--- license: apache-2.0 --- **Base Model**: BLIP2-t5 pretrained version **Finetune data**: * LLAVA 150k (sample one pair of instruction-answer if multi-round conversations) * MiniGPT4 3500 pairs **Hyper-parameters**: * BLIP2-flant5-xl + LLAVA (initial commits) * **v0**: * lr = 2e-5 --> 0.0 with cosine lr scheduler * gbs = 32 * image size = 480 * weight decay = 0.05 * **v1 (same as LLAVA)**: * lr = 2e-5 * gbs = 32 * image size = 480 * weight decay = 0.0 * BLIP2-flant5-xl + MiniGPT4 * lr = 2e-5 * gbs = 32 * image size = 480 * weight decay = 0.0