liuhaotian
commited on
Commit
•
fd95007
1
Parent(s):
7a2c093
Update README.md
Browse files
README.md
CHANGED
@@ -34,8 +34,8 @@ The primary use of LLaVA is research on large multimodal models and chatbots.
|
|
34 |
The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
|
35 |
|
36 |
## Training dataset
|
37 |
-
|
38 |
-
|
39 |
|
40 |
## Evaluation dataset
|
41 |
A preliminary evaluation of the model quality is conducted by creating a set of 90 visual reasoning questions from 30 unique images randomly sampled from COCO val 2014 and each is associated with three types of questions: conversational, detailed description, and complex reasoning. We utilize GPT-4 to judge the model outputs.
|
|
|
34 |
The primary intended users of the model are researchers and hobbyists in computer vision, natural language processing, machine learning, and artificial intelligence.
|
35 |
|
36 |
## Training dataset
|
37 |
+
- 558K filtered image-text pairs from LAION/CC/SBU, captioned by BLIP.
|
38 |
+
- 80K GPT-generated multimodal instruction-following data.
|
39 |
|
40 |
## Evaluation dataset
|
41 |
A preliminary evaluation of the model quality is conducted by creating a set of 90 visual reasoning questions from 30 unique images randomly sampled from COCO val 2014 and each is associated with three types of questions: conversational, detailed description, and complex reasoning. We utilize GPT-4 to judge the model outputs.
|