|
# Overview |
|
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6550c4f27bbfce1878f5f280/AnqbCNf6pRiQ_5uNX0r4d.png) |
|
Volcano employs a single LMM to generate initial responses, feedback, and revisions, as well as decisions to accept revisions. It follows a sequential procedure of an iterative critique-revision-decide loop. |
|
|
|
# Model details |
|
|
|
**Model type:** |
|
Volcano is a multimodal self-feedback guided revision model that was trained using the vicuna model with visual instruction tuning data and multimodal feedback and revision data obtained through gpt-3.5-turbo, following the methodology of LLaVA. |
|
|
|
**Model date:** |
|
Volcano-7b was trained in October 2023. |
|
|
|
**Paper or resources for more information:** |
|
|
|
## Training dataset |
|
- 274k Volcano-train data |
|
- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP. |
|
- 158K GPT-generated multimodal instruction-following data. |
|
- 450K academic-task-oriented VQA data mixture. |
|
- 40K ShareGPT data. |