Safetensors
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -8,7 +8,7 @@ base_model:
8
  ---
9
  # Visual Language Model Based on Qwen and CLIP
10
 
11
- This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to data size is larger than model, so it can only perform sample question-answering tasks on images and currently supports only English question answering.
12
 
13
  ## Training Details
14
 
 
8
  ---
9
  # Visual Language Model Based on Qwen and CLIP
10
 
11
+ This is a visual language multimodal model built upon the Qwen series language models and the CLIP visual encoder. It has been trained for 10 epochs on the LLaVA pre-training dataset and nearly 800K examples (150K instruction fine-tuning and 665K instruction mixed fine-tuning). However, due to data size is larger for model, so it can only perform simple question-answering tasks on images and currently supports only English question answering.
12
 
13
  ## Training Details
14