simonJJJ commited on
Commit
9783357
1 Parent(s): 097b63b
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -18,7 +18,7 @@ inference: false
18
  <br>
19
 
20
  <p align="center">
21
- Qwen-VL <a href="https://modelscope.cn/models/qwen/Qwen-VL/summary">🤖 <a> | <a href="https://huggingface.co/Qwen/Qwen-VL">🤗</a>&nbsp | Qwen-VL-Chat <a href="https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary">🤖 <a>| <a href="https://huggingface.co/Qwen/Qwen-VL-Chat">🤗</a>&nbsp | &nbsp<a href="https://modelscope.cn/studios/qwen/Qwen-VL-Chat-Demo/summary">Demo</a>&nbsp | &nbsp<a href="https://github.com/QwenLM/Qwen-VL/blob/main/visual_memo.md">Report</a>&nbsp&nbsp | &nbsp&nbsp<a href="https://discord.gg/9bjvspyu">Discord</a>
22
 
23
  </p>
24
  <br>
@@ -41,13 +41,13 @@ inference: false
41
  - Qwen-VL: Qwen-VL 以 Qwen-7B 的预训练模型作为语言模型的初始化,并以 [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) 作为视觉编码器的初始化,中间加入单层随机初始化的 cross-attention,经过约1.5B的图文数据训练得到。最终图像输入分辨率为448。
42
  - Qwen-VL-Chat: 在 Qwen-VL 的基础上,我们使用对齐机制打造了基于大语言模型的视觉AI助手Qwen-VL-Chat,其训练数据涵盖了 QWen-7B 的纯文本 SFT 数据、开源 LVLM 的 SFT 数据、数据合成和人工标注的图文对齐数据。
43
 
44
- 如果想了解更多关于模型的信息,请点击[链接](visual_memo.md)查看我们的技术备忘录。
45
 
46
  We release two models of the Qwen-VL series:
47
  - Qwen-VL: The pre-trained LVLM model uses Qwen-7B as the initialization of the LLM, and [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) as the initialization of the visual encoder. And connects them with a randomly initialized cross-attention layer. Qwen-VL was trained on about 1.5B image-text paired data. The final image input resolution is 448.
48
  - Qwen-VL-Chat: A multimodal LLM-based AI assistant, which is trained with alignment techniques.
49
 
50
- For more details about Qwen-VL, please refer to our [technical memo](visual_memo.md).
51
 
52
  ## 评测
53
 
 
18
  <br>
19
 
20
  <p align="center">
21
+ Qwen-VL <a href="https://modelscope.cn/models/qwen/Qwen-VL/summary">🤖 <a> | <a href="https://huggingface.co/Qwen/Qwen-VL">🤗</a>&nbsp | Qwen-VL-Chat <a href="https://modelscope.cn/models/qwen/Qwen-VL-Chat/summary">🤖 <a>| <a href="https://huggingface.co/Qwen/Qwen-VL-Chat">🤗</a>&nbsp | &nbsp<a href="https://modelscope.cn/studios/qwen/Qwen-VL-Chat-Demo/summary">Demo</a>&nbsp | &nbsp<a href="https://github.com/QwenLM/Qwen-VL/blob/master/visual_memo.md">Report</a>&nbsp&nbsp | &nbsp&nbsp<a href="https://discord.gg/9bjvspyu">Discord</a>
22
 
23
  </p>
24
  <br>
 
41
  - Qwen-VL: Qwen-VL 以 Qwen-7B 的预训练模型作为语言模型的初始化,并以 [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) 作为视觉编码器的初始化,中间加入单层随机初始化的 cross-attention,经过约1.5B的图文数据训练得到。最终图像输入分辨率为448。
42
  - Qwen-VL-Chat: 在 Qwen-VL 的基础上,我们使用对齐机制打造了基于大语言模型的视觉AI助手Qwen-VL-Chat,其训练数据涵盖了 QWen-7B 的纯文本 SFT 数据、开源 LVLM 的 SFT 数据、数据合成和人工标注的图文对齐数据。
43
 
44
+ 如果想了解更多关于模型的信息,请点击[链接](https://github.com/QwenLM/Qwen-VL/blob/master/visual_memo.md)查看我们的技术备忘录。
45
 
46
  We release two models of the Qwen-VL series:
47
  - Qwen-VL: The pre-trained LVLM model uses Qwen-7B as the initialization of the LLM, and [Openclip ViT-bigG](https://github.com/mlfoundations/open_clip) as the initialization of the visual encoder. And connects them with a randomly initialized cross-attention layer. Qwen-VL was trained on about 1.5B image-text paired data. The final image input resolution is 448.
48
  - Qwen-VL-Chat: A multimodal LLM-based AI assistant, which is trained with alignment techniques.
49
 
50
+ For more details about Qwen-VL, please refer to our [technical memo](https://github.com/QwenLM/Qwen-VL/blob/master/visual_memo.md).
51
 
52
  ## 评测
53