--- title: MultiModal Phi2 emoji: 🚀 colorFrom: blue colorTo: red sdk: gradio sdk_version: 3.35.2 app_file: app.py pinned: false license: mit --- ## Phi2 : Multimodal Finetuning ### Details 1. LLM Backbone: Phi2 2. Vision Tower: clip-vit-large-patch14-336 3. Audio Model: Whisper 4. Pretraining Dataset: LAION-CC-SBU dataset with BLIP captions(200k samples) 5. Finetuning Dataset: Instruct 150k dataset based on COCO ### Design ![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/56df24cd-2681-4e17-ab64-9652f609b15f) ### Pretraining #### Training Loss Curve ![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/b6c37a95-0a56-4b52-8719-3ff56dc1b703) #### Learing Rate ![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/44d9a11b-b28d-47e1-ba1d-d6dc22ebe748) #### Training Logs ![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/76543d98-d9fe-4c1a-ac47-3d06e48053ad) ### Finetuning #### Training Loss Curve ![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/45ef40bd-fae5-4cfe-a522-c0eed2833230) #### Learing Rate ![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/df60ee62-a537-4e36-a7f7-f7111e101162) #### Training Logs ![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/2747acce-bc99-4c37-a05a-d5e81cb9aa9d) ### Results ![image](https://github.com/RaviNaik/ERA-CAPSTONE/assets/23289802/f12a9f04-df32-413e-b957-774c30381b2b)