Safetensors
jiangchengchengNLP commited on
Commit
89ebfd1
1 Parent(s): 4ab846c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +8 -0
README.md CHANGED
@@ -1,3 +1,11 @@
 
 
 
 
 
 
 
 
1
  # Visual Language Model Based on Qwen and CLIP
2
 
3
  This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to its size, it can only perform simple question-answering tasks on images and currently supports only English question answering.
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - liuhaotian/LLaVA-CC3M-Pretrain-595K
5
+ base_model:
6
+ - Qwen/Qwen2.5-0.5B
7
+ - openai/clip-vit-large-patch14-336
8
+ ---
9
  # Visual Language Model Based on Qwen and CLIP
10
 
11
  This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to its size, it can only perform simple question-answering tasks on images and currently supports only English question answering.