jiangchengchengNLP
commited on
Commit
•
89ebfd1
1
Parent(s):
4ab846c
Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# Visual Language Model Based on Qwen and CLIP
|
2 |
|
3 |
This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to its size, it can only perform simple question-answering tasks on images and currently supports only English question answering.
|
|
|
1 |
+
---
|
2 |
+
license: apache-2.0
|
3 |
+
datasets:
|
4 |
+
- liuhaotian/LLaVA-CC3M-Pretrain-595K
|
5 |
+
base_model:
|
6 |
+
- Qwen/Qwen2.5-0.5B
|
7 |
+
- openai/clip-vit-large-patch14-336
|
8 |
+
---
|
9 |
# Visual Language Model Based on Qwen and CLIP
|
10 |
|
11 |
This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to its size, it can only perform simple question-answering tasks on images and currently supports only English question answering.
|