zxdu20 commited on
Commit
f4f759a
2 Parent(s): ab02f3d c57199f

Merge branch 'main' of https://huggingface.co/THUDM/visualglm-6b

Browse files
Files changed (2) hide show
  1. MODEL_LICENSE +3 -3
  2. README.md +5 -4
MODEL_LICENSE CHANGED
@@ -1,10 +1,10 @@
1
- The GLM-130B License
2
 
3
  1. Definitions
4
 
5
- “Licensor” means the GLM-130B Model Team that distributes its Software.
6
 
7
- “Software” means the GLM-130B model parameters made available under this license.
8
 
9
  2. License Grant
10
 
 
1
+ The VisualGLM-6B License
2
 
3
  1. Definitions
4
 
5
+ “Licensor” means the VisualGLM-6B Model Team that distributes its Software.
6
 
7
+ “Software” means the VisualGLM-6B model parameters made available under this license.
8
 
9
  2. License Grant
10
 
README.md CHANGED
@@ -4,6 +4,7 @@ language:
4
  - en
5
  tags:
6
  - glm
 
7
  - chatglm
8
  - thudm
9
  ---
@@ -17,7 +18,7 @@ tags:
17
  </p>
18
 
19
  ## 介绍
20
- CVisualGLM-6B 是一个开源的,支持**图像、中文和英文**的多模态对话语言模型,语言模型基于 [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B),具有 62 亿参数;图像部分通过训练 [BLIP2-Qformer](https://arxiv.org/abs/2301.12597) 构建起视觉模型与语言模型的桥梁,整体模型共78亿参数。
21
 
22
  VisualGLM-6B 依靠来自于 [CogView](https://arxiv.org/abs/2105.13290) 数据集的30M高质量中文图文对,与300M经过筛选的英文图文对进行预训练,中英文权重相同。该训练方式较好地将视觉信息对齐到ChatGLM的语义空间;之后的微调阶段,模型在长视觉问答数据上训练,以生成符合人类偏好的答案。
23
 
@@ -33,12 +34,12 @@ pip install SwissArmyTransformer>=0.3.6 torch>1.10.0 torchvision transformers>=4
33
 
34
  ```ipython
35
  >>> from transformers import AutoTokenizer, AutoModel
36
- >>> tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
37
- >>> model = AutoModel.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True).half().cuda()
38
  >>> image_path = "your image path"
39
  >>> response, history = model.chat(tokenizer, image_path, "描述这张图片。", history=[])
40
  >>> print(response)
41
- >>> response, history = model.chat(tokenizer, "这张图片可能是在什么场所拍摄的?", history=history)
42
  >>> print(response)
43
  ```
44
 
 
4
  - en
5
  tags:
6
  - glm
7
+ - visualglm
8
  - chatglm
9
  - thudm
10
  ---
 
18
  </p>
19
 
20
  ## 介绍
21
+ VisualGLM-6B 是一个开源的,支持**图像、中文和英文**的多模态对话语言模型,语言模型基于 [ChatGLM-6B](https://github.com/THUDM/ChatGLM-6B),具有 62 亿参数;图像部分通过训练 [BLIP2-Qformer](https://arxiv.org/abs/2301.12597) 构建起视觉模型与语言模型的桥梁,整体模型共78亿参数。
22
 
23
  VisualGLM-6B 依靠来自于 [CogView](https://arxiv.org/abs/2105.13290) 数据集的30M高质量中文图文对,与300M经过筛选的英文图文对进行预训练,中英文权重相同。该训练方式较好地将视觉信息对齐到ChatGLM的语义空间;之后的微调阶段,模型在长视觉问答数据上训练,以生成符合人类偏好的答案。
24
 
 
34
 
35
  ```ipython
36
  >>> from transformers import AutoTokenizer, AutoModel
37
+ >>> tokenizer = AutoTokenizer.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True)
38
+ >>> model = AutoModel.from_pretrained("THUDM/visualglm-6b", trust_remote_code=True).half().cuda()
39
  >>> image_path = "your image path"
40
  >>> response, history = model.chat(tokenizer, image_path, "描述这张图片。", history=[])
41
  >>> print(response)
42
+ >>> response, history = model.chat(tokenizer, image_path, "这张图片可能是在什么场所拍摄的?", history=history)
43
  >>> print(response)
44
  ```
45