Ubuntu commited on
Commit
ba40b0b
1 Parent(s): e8b84fe
Files changed (2) hide show
  1. README.md +2 -4
  2. README_en.md +100 -0
README.md CHANGED
@@ -15,7 +15,7 @@ tags:
15
  inference: false
16
  ---
17
 
18
- # glm-4v-9b
19
 
20
  GLM-4V-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源多模态版本。
21
  **GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多轮对话能力,在中英文综合能力、感知推理、文字识别、图表理解等多方面多模态评测中,GLM-4V-9B 表现出超越 GPT-4-turbo-2024-04-09、Gemini
@@ -73,12 +73,10 @@ with torch.no_grad():
73
  print(tokenizer.decode(outputs[0]))
74
  ```
75
 
76
- ## 协议 (License)
77
 
78
  GLM-4 模型的权重的使用则需要遵循 [LICENSE](LICENSE)。
79
 
80
- Rhe use of the GLM-4 model weights needs to comply with the [LICENSE](LICENSE).
81
-
82
  ## 引用
83
 
84
  如果你觉得我们的工作有帮助的话,请考虑引用下列论文。
 
15
  inference: false
16
  ---
17
 
18
+ # GLM-4V-9B
19
 
20
  GLM-4V-9B 是智谱 AI 推出的最新一代预训练模型 GLM-4 系列中的开源多模态版本。
21
  **GLM-4V-9B** 具备 1120 * 1120 高分辨率下的中英双语多轮对话能力,在中英文综合能力、感知推理、文字识别、图表理解等多方面多模态评测中,GLM-4V-9B 表现出超越 GPT-4-turbo-2024-04-09、Gemini
 
73
  print(tokenizer.decode(outputs[0]))
74
  ```
75
 
76
+ ## 协议
77
 
78
  GLM-4 模型的权重的使用则需要遵循 [LICENSE](LICENSE)。
79
 
 
 
80
  ## 引用
81
 
82
  如果你觉得我们的工作有帮助的话,请考虑引用下列论文。
README_en.md ADDED
@@ -0,0 +1,100 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # GLM-4V-9B
2
+
3
+ GLM-4V-9B is an open source multimodal version of the latest generation of pre-trained models in the GLM-4 series launched by Zhipu AI.
4
+ **GLM-4V-9B** has the ability to conduct multi-round conversations in Chinese and English at a high resolution of 1120 * 1120. In multimodal evaluations of comprehensive Chinese and English abilities, perceptual reasoning, text recognition, and chart understanding, GLM-4V-9B has shown superior performance over GPT-4-turbo-2024-04-09, Gemini
5
+ 1.0 Pro, Qwen-VL-Max, and Claude 3 Opus.
6
+
7
+ ### Multimodal
8
+
9
+ GLM-4V-9B is a multimodal language model with visual understanding capabilities. The evaluation results of its related classic tasks are as follows:
10
+
11
+
12
+ | | **MMBench-EN-Test** | **MMBench-CN-Test** | **SEEDBench_IMG** | **MMStar** | **MMMU** | **MME** | **HallusionBench** | **AI2D** | **OCRBench** |
13
+ |-------------------------|---------------------|---------------------|-------------------|------------|----------|---------|--------------------|----------|--------------|
14
+ | | 英文综合 | 中文综合 | 综合能力 | 综合能力 | 学科综合 | 感知推理 | 幻觉性 | 图表理解 | 文字识别 |
15
+ | **GPT-4o, 20240513** | 83.4 | 82.1 | 77.1 | 63.9 | 69.2 | 2310.3 | 55 | 84.6 | 736 |
16
+ | **GPT-4v, 20240409** | 81 | 80.2 | 73 | 56 | 61.7 | 2070.2 | 43.9 | 78.6 | 656 |
17
+ | **GPT-4v, 20231106** | 77 | 74.4 | 72.3 | 49.7 | 53.8 | 1771.5 | 46.5 | 75.9 | 516 |
18
+ | **InternVL-Chat-V1.5** | 82.3 | 80.7 | 75.2 | 57.1 | 46.8 | 2189.6 | 47.4 | 80.6 | 720 |
19
+ | **LlaVA-Next-Yi-34B** | 81.1 | 79 | 75.7 | 51.6 | 48.8 | 2050.2 | 34.8 | 78.9 | 574 |
20
+ | **Step-1V** | 80.7 | 79.9 | 70.3 | 50 | 49.9 | 2206.4 | 48.4 | 79.2 | 625 |
21
+ | **MiniCPM-Llama3-V2.5** | 77.6 | 73.8 | 72.3 | 51.8 | 45.8 | 2024.6 | 42.4 | 78.4 | 725 |
22
+ | **Qwen-VL-Max** | 77.6 | 75.7 | 72.7 | 49.5 | 52 | 2281.7 | 41.2 | 75.7 | 684 |
23
+ | **GeminiProVision** | 73.6 | 74.3 | 70.7 | 38.6 | 49 | 2148.9 | 45.7 | 72.9 | 680 |
24
+ | **Claude-3V Opus** | 63.3 | 59.2 | 64 | 45.7 | 54.9 | 1586.8 | 37.8 | 70.6 | 694 |
25
+ | **GLM-4v-9B** | 81.1 | 79.4 | 76.8 | 58.7 | 47.2 | 2163.8 | 46.6 | 81.1 | 786 |
26
+
27
+
28
+ **This repository is the model repository of GLM-4V-9B, supporting `8K` context length.**
29
+
30
+ ## Quick Start
31
+
32
+ ```python
33
+
34
+ import torch
35
+ from PIL import Image
36
+ from transformers import AutoModelForCausalLM, AutoTokenizer
37
+
38
+ device = "cuda"
39
+
40
+ tokenizer = AutoTokenizer.from_pretrained("THUDM/glm-4v-9b", trust_remote_code=True)
41
+
42
+ query = 'discribe this image'
43
+ image = Image.open("your image").convert('RGB')
44
+ inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}],
45
+ add_generation_prompt=True, tokenize=True, return_tensors="pt",
46
+ return_dict=True) # chat mode
47
+
48
+ inputs = inputs.to(device)
49
+ model = AutoModelForCausalLM.from_pretrained(
50
+ "THUDM/glm-4v-9b",
51
+ torch_dtype=torch.bfloat16,
52
+ low_cpu_mem_usage=True,
53
+ trust_remote_code=True
54
+ ).to(device).eval()
55
+
56
+ gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1}
57
+ with torch.no_grad():
58
+ outputs = model.generate(**inputs, **gen_kwargs)
59
+ outputs = outputs[:, inputs['input_ids'].shape[1]:]
60
+ print(tokenizer.decode(outputs[0]))
61
+ ```
62
+
63
+ ## License
64
+
65
+ Rhe use of the GLM-4 model weights needs to comply with the [LICENSE](LICENSE).
66
+
67
+ ## Citation
68
+
69
+ If you find our work helpful, please consider citing the following papers.
70
+
71
+ ```
72
+ @article{zeng2022glm,
73
+ title={Glm-130b: An open bilingual pre-trained model},
74
+ author={Zeng, Aohan and Liu, Xiao and Du, Zhengxiao and Wang, Zihan and Lai, Hanyu and Ding, Ming and Yang, Zhuoyi and Xu, Yifan and Zheng, Wendi and Xia, Xiao and others},
75
+ journal={arXiv preprint arXiv:2210.02414},
76
+ year={2022}
77
+ }
78
+ ```
79
+
80
+ ```
81
+ @inproceedings{du2022glm,
82
+ title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
83
+ author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
84
+ booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
85
+ pages={320--335},
86
+ year={2022}
87
+ }
88
+ ```
89
+
90
+ ```
91
+ @misc{wang2023cogvlm,
92
+ title={CogVLM: Visual Expert for Pretrained Language Models},
93
+ author={Weihan Wang and Qingsong Lv and Wenmeng Yu and Wenyi Hong and Ji Qi and Yan Wang and Junhui Ji and Zhuoyi Yang and Lei Zhao and Xixuan Song and Jiazheng Xu and Bin Xu and Juanzi Li and Yuxiao Dong and Ming Ding and Jie Tang},
94
+ year={2023},
95
+ eprint={2311.03079},
96
+ archivePrefix={arXiv},
97
+ primaryClass={cs.CV}
98
+ }
99
+ ```
100
+