xiao-long commited on
Commit
0211caf
1 Parent(s): 1ea047a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +101 -18
README.md CHANGED
@@ -31,29 +31,68 @@ The vision of OpenCSG is to empower every industry, every company, and every ind
31
  ## Model Description
32
 
33
 
 
34
 
35
 
36
- **csg-wukong-1B-VL-v0.1** was finetuned on [csg-wukong-1B](https://huggingface.co/opencsg/csg-wukong-1B).
37
- <br>
38
- we will introduce more information about this model.
39
 
40
- ## Model Evaluation results
41
 
42
- We submitted csg-wukong-1B on the [open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard), and
43
- the results show our model ranked the 8th among the ~1.5B pretrained small language models.
44
 
 
 
 
45
 
46
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/661790397437201d78141856/_HRTxL6N0qnNPNt-P8k9k.png)
47
 
 
48
 
49
 
50
- # Training
 
 
 
 
 
51
 
52
- ## Hardware
 
 
 
53
 
54
- - **GPUs:** 16 H800
55
- - **Training time:** 43days
56
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
57
  ## Software
58
 
59
  - **Orchestration:** [Deepspeed](https://github.com/OpenCSGs)
@@ -88,22 +127,66 @@ OpenCSG的愿景是让每个行业、每个公司、每个人都拥有自己的
88
  ## 模型介绍
89
 
90
 
91
- **csg-wukong-1B-VL-v0.1** 在[csg-wukong-1B](https://huggingface.co/opencsg/csg-wukong-1B)预训练模型上微调而成.
92
- <br>
 
 
 
 
 
 
 
 
 
 
 
93
 
94
- 我们将在后面介绍更多关于这个模型的信息。
95
 
96
 
97
- ## 模型评测结果
 
 
 
 
 
98
 
99
- 我们把csg-wukong-1B模型提交到[open_llm_leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)榜单上,结果显示我们的模型目前在~1.5B小语言模型中排名第8。
 
 
 
100
 
 
 
101
 
102
- ![image/png](https://cdn-uploads.huggingface.co/production/uploads/661790397437201d78141856/ZfWZ1Fd7ccKrJVx0okV9z.png)
 
 
 
 
 
 
 
 
 
103
 
 
 
 
 
 
 
 
104
 
 
 
 
 
 
 
105
 
106
- # 训练
 
107
 
108
  ## 硬件资源
109
 
 
31
  ## Model Description
32
 
33
 
34
+ [CSG-VL](https://github.com/OpenCSGs/csg-vl) is a family of small but strong multimodal models. It offers multiple plug-and-play vision encoders, like EVA-CLIP, SigLIP and language backbones, including Wukong-1B, Llama-3-8B, Phi-1.5, StableLM-2, Qwen1.5 and Phi-2.
35
 
36
 
37
+ ## Quickstart
 
 
38
 
39
+ Here we show a code snippet to show you how to use the model with transformers.
40
 
41
+ Before running the snippet, you need to install the following dependencies:
 
42
 
43
+ ```shell
44
+ pip install torch transformers accelerate pillow
45
+ ```
46
 
47
+ If the CUDA memory is enough, it would be faster to execute this snippet by setting `CUDA_VISIBLE_DEVICES=0`.
48
 
49
+ Users especially those in Chinese mainland may want to refer to a HuggingFace [mirror site](https://hf-mirror.com).
50
 
51
 
52
+ ```python
53
+ import torch
54
+ import transformers
55
+ from transformers import AutoModelForCausalLM, AutoTokenizer
56
+ from PIL import Image
57
+ import warnings
58
 
59
+ # disable some warnings
60
+ transformers.logging.set_verbosity_error()
61
+ transformers.logging.disable_progress_bar()
62
+ warnings.filterwarnings('ignore')
63
 
64
+ # set device
65
+ torch.set_default_device('cpu') # or 'cuda'
66
 
67
+ model_name = 'opencsg/csg-wukong-1B-VL-v0.1'
68
+ # create model
69
+ model = AutoModelForCausalLM.from_pretrained(
70
+ model_name,
71
+ torch_dtype=torch.float16,
72
+ device_map='auto',
73
+ trust_remote_code=True)
74
+ tokenizer = AutoTokenizer.from_pretrained(
75
+ model_name,
76
+ trust_remote_code=True)
77
+
78
+ # text prompt
79
+ prompt = 'What is the astronaut holding in his hand?'
80
+ text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\n{prompt} ASSISTANT:"
81
+ text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
82
+ input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)
83
+ image = Image.open('example_1.png')
84
+ image_tensor = model.process_images([image], model.config).to(dtype=model.dtype)
85
+
86
+ # generate
87
+ output_ids = model.generate(
88
+ input_ids,
89
+ images=image_tensor,
90
+ max_new_tokens=100,
91
+ use_cache=True)[0]
92
+
93
+ print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
94
+
95
+ ```
96
  ## Software
97
 
98
  - **Orchestration:** [Deepspeed](https://github.com/OpenCSGs)
 
127
  ## 模型介绍
128
 
129
 
130
+ [CSG-VL](https://github.com/OpenCSGs/csg-vl) 是一个小型但强大的多模式模型系列。它提供多种即插即用视觉编码器,如 EVA-CLIP、SigLIP 和语言主干,包括 Wukong-1B、Llama-3-8B、Phi-1.5、StableLM-2、Qwen1.5 和 Phi-2。
131
+
132
+
133
+ ## 快速开始
134
+ 下面我们将展示一个代码片段,告诉您如何使用带有转换器的模型。
135
+
136
+ 在运行该代码段之前,您需要安装以下依赖项:
137
+
138
+ ```shell
139
+ pip install torch transformers accelerate pillow
140
+ ```
141
+
142
+ 如果 CUDA 内存足够,通过设置 CUDA_VISIBLE_DEVICES=0 来执行此代码片段会更快。
143
 
144
+ 用户尤其是中国大陆的用户可能需要参考 HuggingFace [镜像站点](https://hf-mirror.com)。
145
 
146
 
147
+ ```python
148
+ import torch
149
+ import transformers
150
+ from transformers import AutoModelForCausalLM, AutoTokenizer
151
+ from PIL import Image
152
+ import warnings
153
 
154
+ # disable some warnings
155
+ transformers.logging.set_verbosity_error()
156
+ transformers.logging.disable_progress_bar()
157
+ warnings.filterwarnings('ignore')
158
 
159
+ # set device
160
+ torch.set_default_device('cpu') # or 'cuda'
161
 
162
+ model_name = 'opencsg/csg-wukong-1B-VL-v0.1'
163
+ # create model
164
+ model = AutoModelForCausalLM.from_pretrained(
165
+ model_name,
166
+ torch_dtype=torch.float16,
167
+ device_map='auto',
168
+ trust_remote_code=True)
169
+ tokenizer = AutoTokenizer.from_pretrained(
170
+ model_name,
171
+ trust_remote_code=True)
172
 
173
+ # text prompt
174
+ prompt = 'What is the astronaut holding in his hand?'
175
+ text = f"A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: <image>\n{prompt} ASSISTANT:"
176
+ text_chunks = [tokenizer(chunk).input_ids for chunk in text.split('<image>')]
177
+ input_ids = torch.tensor(text_chunks[0] + [-200] + text_chunks[1], dtype=torch.long).unsqueeze(0)
178
+ image = Image.open('example_1.png')
179
+ image_tensor = model.process_images([image], model.config).to(dtype=model.dtype)
180
 
181
+ # generate
182
+ output_ids = model.generate(
183
+ input_ids,
184
+ images=image_tensor,
185
+ max_new_tokens=100,
186
+ use_cache=True)[0]
187
 
188
+ print(tokenizer.decode(output_ids[input_ids.shape[1]:], skip_special_tokens=True).strip())
189
+ ```
190
 
191
  ## 硬件资源
192