Update README.md
#1
by
jiangchengchengNLP
- opened
README.md
CHANGED
@@ -8,7 +8,7 @@ base_model:
|
|
8 |
---
|
9 |
# Visual Language Model Based on Qwen and CLIP
|
10 |
|
11 |
-
This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to data size is larger
|
12 |
|
13 |
## Training Details
|
14 |
|
|
|
8 |
---
|
9 |
# Visual Language Model Based on Qwen and CLIP
|
10 |
|
11 |
+
This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to data size is larger for model, so it can only perform simple question-answering tasks on images and currently supports only English question answering.
|
12 |
|
13 |
## Training Details
|
14 |
|