chenkq commited on
Commit
a93e1d3
·
1 Parent(s): e5624c0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +65 -2
README.md CHANGED
@@ -15,6 +15,20 @@ language:
15
 
16
  # 快速开始(Qiuckstart)
17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
18
  ```python
19
  import torch
20
  import requests
@@ -31,7 +45,6 @@ model = AutoModelForCausalLM.from_pretrained(
31
 
32
 
33
  # chat example
34
-
35
  query = 'Describe this image'
36
  image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB')
37
  inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image]) # chat mode
@@ -56,7 +69,6 @@ with torch.no_grad():
56
 
57
 
58
  # vqa example
59
-
60
  query = 'How many houses are there in this cartoon?'
61
  image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/3.jpg?raw=true', stream=True).raw).convert('RGB')
62
  inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image], template_version='vqa') # vqa mode
@@ -76,6 +88,57 @@ with torch.no_grad():
76
  # 4</s>
77
  ```
78
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
79
  # 方法(Method)
80
 
81
  CogVLM 模型包括四个基本组件:视觉变换器(ViT)编码器、MLP适配器、预训练的大型语言模型(GPT)和一个**视觉专家模块**。更多细节请参见[Paper](https://github.com/THUDM/CogVLM/blob/main/assets/cogvlm-paper.pdf)。
 
15
 
16
  # 快速开始(Qiuckstart)
17
 
18
+ 硬件需求(hardware requirement)
19
+
20
+ 需要近 40GB GPU 显存用于模型推理。如果没有一整块GPU显存超过40GB,则需要使用accelerate的将模型切分到多个有较小显存的GPU设备上。
21
+
22
+ 40GB VRAM for inference. If there is no single GPU with more than 40GB of VRAM, you will need to use the "accelerate" library to dispatch the model into multiple GPUs with smaller VRAM.
23
+
24
+ 安装依赖(dependencies)
25
+
26
+ ```base
27
+ pip install torch==2.1.0 transformers==4.35.0 accelerate==0.24.1 sentencepiece==0.1.99 einops==0.7.0 xformers==0.0.22.post7 triton==2.1.0
28
+ ```
29
+
30
+ 代码示例(example)
31
+
32
  ```python
33
  import torch
34
  import requests
 
45
 
46
 
47
  # chat example
 
48
  query = 'Describe this image'
49
  image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB')
50
  inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image]) # chat mode
 
69
 
70
 
71
  # vqa example
 
72
  query = 'How many houses are there in this cartoon?'
73
  image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/3.jpg?raw=true', stream=True).raw).convert('RGB')
74
  inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image], template_version='vqa') # vqa mode
 
88
  # 4</s>
89
  ```
90
 
91
+ 当单卡显存不足时,可以将模型切分到多个小显存GPU上
92
+
93
+ dispatch the model into multiple GPUs with smaller VRAM.
94
+
95
+ ```python
96
+ import torch
97
+ import requests
98
+ from PIL import Image
99
+ from transformers import AutoModelForCausalLM, LlamaTokenizer
100
+ from accelerate import init_empty_weights, infer_auto_device_map, load_checkpoint_and_dispatch
101
+
102
+ tokenizer = LlamaTokenizer.from_pretrained('lmsys/vicuna-7b-v1.5')
103
+ with init_empty_weights():
104
+ model = AutoModelForCausalLM.from_pretrained(
105
+ 'THUDM/cogvlm-chat-hf',
106
+ torch_dtype=torch.bfloat16,
107
+ low_cpu_mem_usage=True,
108
+ trust_remote_code=True,
109
+ )
110
+ device_map = infer_auto_device_map(model, max_memory={0:'20GiB',1:'20GiB','cpu':'16GiB'}, no_split_module_classes='CogVLMDecoderLayer')
111
+ model = load_checkpoint_and_dispatch(
112
+ model,
113
+ 'local/path/to/hf/version/chat/model', # typical, '~/.cache/huggingface/hub/models--THUDM--cogvlm-chat-hf/snapshots/balabala'
114
+ device_map=device_map,
115
+ )
116
+ model = model.eval()
117
+
118
+ # check device for weights if u want to
119
+ for n, p in model.named_parameters():
120
+ print(f"{n}: {p.device}")
121
+
122
+ # chat example
123
+ query = 'Describe this image'
124
+ image = Image.open(requests.get('https://github.com/THUDM/CogVLM/blob/main/examples/1.png?raw=true', stream=True).raw).convert('RGB')
125
+ inputs = model.build_conversation_input_ids(tokenizer, query=query, history=[], images=[image]) # chat mode
126
+ inputs = {
127
+ 'input_ids': inputs['input_ids'].unsqueeze(0).to('cuda'),
128
+ 'token_type_ids': inputs['token_type_ids'].unsqueeze(0).to('cuda'),
129
+ 'attention_mask': inputs['attention_mask'].unsqueeze(0).to('cuda'),
130
+ 'images': [[inputs['images'][0].to('cuda').to(torch.bfloat16)]],
131
+ }
132
+ gen_kwargs = {"max_length": 2048, "do_sample": False}
133
+
134
+ with torch.no_grad():
135
+ outputs = model.generate(**inputs, **gen_kwargs)
136
+ outputs = outputs[:, inputs['input_ids'].shape[1]:]
137
+ print(tokenizer.decode(outputs[0]))
138
+ ```
139
+
140
+
141
+
142
  # 方法(Method)
143
 
144
  CogVLM 模型包括四个基本组件:视觉变换器(ViT)编码器、MLP适配器、预训练的大型语言模型(GPT)和一个**视觉专家模块**。更多细节请参见[Paper](https://github.com/THUDM/CogVLM/blob/main/assets/cogvlm-paper.pdf)。