--- license: apache-2.0 tags: - llava pipeline_tag: image-text-to-text --- **Base Model**: BLIP2-t5 pretrained version **Finetune data**: * LLAVA 150k (sample one pair of instruction-answer if multi-round conversations) * MiniGPT4 3500 pairs **Hyper-parameters**: * BLIP2-flant5-xl + LLAVA (initial commits) * **v0**: * lr = 2e-5 --> 0.0 with cosine lr scheduler * gbs = 32 * image size = 480 * weight decay = 0.05 * **v1 (same as LLAVA)**: * lr = 2e-5 * gbs = 32 * image size = 224 * weight decay = 0.0 * Others * lr = 2e-5 * gbs = 32 * image size = 224 * weight decay = 0.0