czczup commited on
Commit
ef43f6e
1 Parent(s): ff11e97

Upload folder using huggingface_hub

Browse files
Files changed (4) hide show
  1. README.md +59 -15
  2. config.json +1 -1
  3. conversation.py +7 -4
  4. tokenization_internlm2_fast.py +211 -0
README.md CHANGED
@@ -30,19 +30,18 @@ This article comprises the following sections:
30
  <!-- toc -->
31
 
32
  - [Inference](#inference)
33
- - [Evaluation](#evaluation)
34
  - [Service](#service)
35
 
36
  <!-- tocstop -->
37
 
38
  ## Inference
39
 
40
- For lmdeploy v0.5.0, please configure the chat template config first. Create the following JSON file `chat_template.json`.
41
 
42
  ```json
43
  {
44
- "model_name":"internlm2",
45
- "meta_instruction":"我是书生·万象,英文名是InternVL,是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。人工智能实验室致力于原始技术创新,开源开放,共享共创,推动科技进步和产业发展。",
46
  "stop_words":["<|im_start|>", "<|im_end|>"]
47
  }
48
  ```
@@ -52,34 +51,79 @@ Trying the following codes, you can perform the batched offline inference with t
52
  ```python
53
  from lmdeploy import pipeline
54
  from lmdeploy.model import ChatTemplateConfig
 
55
  from lmdeploy.vl import load_image
56
 
57
  model = 'OpenGVLab/InternVL2-2B-AWQ'
58
  chat_template_config = ChatTemplateConfig.from_json('chat_template.json')
59
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
60
- pipe = pipeline(model, chat_template_config=chat_template_config, log_level='INFO')
 
 
 
61
  response = pipe(('describe this image', image))
62
  print(response)
63
  ```
64
 
65
  For more information about the pipeline parameters, please refer to [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/pipeline.md).
66
 
67
- ## Evaluation
68
-
69
- Please overview [this guide](https://opencompass.readthedocs.io/en/latest/advanced_guides/evaluation_turbomind.html) about model evaluation with LMDeploy.
70
-
71
  ## Service
72
 
73
- LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
74
 
75
  ```shell
76
- lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --backend turbomind --model-format awq --chat-template chat_template.json
77
  ```
78
 
79
- The default port of `api_server` is `23333`. After the server is launched, you can communicate with server on terminal through `api_client`:
80
 
81
- ```shell
82
- lmdeploy serve api_client http://0.0.0.0:23333
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
83
  ```
84
 
85
- You can overview and try out `api_server` APIs online by swagger UI at `http://0.0.0.0:23333`, or you can also read the API specification from [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/serving/restful_api.md).
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
  <!-- toc -->
31
 
32
  - [Inference](#inference)
 
33
  - [Service](#service)
34
 
35
  <!-- tocstop -->
36
 
37
  ## Inference
38
 
39
+ To deploy InternVL2, please configure the chat template config first. Create the following JSON file `chat_template.json`.
40
 
41
  ```json
42
  {
43
+ "model_name":"internvl-internlm2",
44
+ "meta_instruction":"我是书生·万象,英文名是InternVL,是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。",
45
  "stop_words":["<|im_start|>", "<|im_end|>"]
46
  }
47
  ```
 
51
  ```python
52
  from lmdeploy import pipeline
53
  from lmdeploy.model import ChatTemplateConfig
54
+ from lmdeploy.messages import TurbomindEngineConfig
55
  from lmdeploy.vl import load_image
56
 
57
  model = 'OpenGVLab/InternVL2-2B-AWQ'
58
  chat_template_config = ChatTemplateConfig.from_json('chat_template.json')
59
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
60
+ backend_config = TurbomindEngineConfig(model_format='awq')
61
+ pipe = pipeline(model, chat_template_config=chat_template_config,
62
+ backend_config=backend_config,
63
+ log_level='INFO')
64
  response = pipe(('describe this image', image))
65
  print(response)
66
  ```
67
 
68
  For more information about the pipeline parameters, please refer to [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/pipeline.md).
69
 
 
 
 
 
70
  ## Service
71
 
72
+ LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup.
73
 
74
  ```shell
75
+ lmdeploy serve api_server OpenGVLab/InternVL2-2B-AWQ --model-name InternVL2-2B-AWQ --backend turbomind --server-port 23333 --model-format awq --chat-template chat_template.json
76
  ```
77
 
78
+ To use the OpenAI-style interface, you need to install OpenAI:
79
 
80
+ Then, use the code below to make the API call:
81
+
82
+ ```python
83
+ from openai import OpenAI
84
+
85
+ client = OpenAI(api_key='YOUR_API_KEY', base_url='http://0.0.0.0:23333/v1')
86
+ model_name = client.models.list().data[0].id
87
+ response = client.chat.completions.create(
88
+ model="InternVL2-2B-AWQ",
89
+ messages=[{
90
+ 'role':
91
+ 'user',
92
+ 'content': [{
93
+ 'type': 'text',
94
+ 'text': 'describe this image',
95
+ }, {
96
+ 'type': 'image_url',
97
+ 'image_url': {
98
+ 'url':
99
+ 'https://modelscope.oss-cn-beijing.aliyuncs.com/resource/tiger.jpeg',
100
+ },
101
+ }],
102
+ }],
103
+ temperature=0.8,
104
+ top_p=0.8)
105
+ print(response)
106
  ```
107
 
108
+ ## License
109
+
110
+ This project is released under the MIT license, while InternLM is licensed under the Apache-2.0 license.
111
+
112
+ ## Citation
113
+
114
+ If you find this project useful in your research, please consider citing:
115
+
116
+ ```BibTeX
117
+ @article{chen2023internvl,
118
+ title={InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks},
119
+ author={Chen, Zhe and Wu, Jiannan and Wang, Wenhai and Su, Weijie and Chen, Guo and Xing, Sen and Zhong, Muyan and Zhang, Qinglong and Zhu, Xizhou and Lu, Lewei and Li, Bin and Luo, Ping and Lu, Tong and Qiao, Yu and Dai, Jifeng},
120
+ journal={arXiv preprint arXiv:2312.14238},
121
+ year={2023}
122
+ }
123
+ @article{chen2024far,
124
+ title={How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites},
125
+ author={Chen, Zhe and Wang, Weiyun and Tian, Hao and Ye, Shenglong and Gao, Zhangwei and Cui, Erfei and Tong, Wenwen and Hu, Kongzhi and Luo, Jiapeng and Ma, Zheng and others},
126
+ journal={arXiv preprint arXiv:2404.16821},
127
+ year={2024}
128
+ }
129
+ ```
config.json CHANGED
@@ -95,7 +95,7 @@
95
  "tie_word_embeddings": false,
96
  "tokenizer_class": null,
97
  "top_k": 50,
98
- "top_p": null,
99
  "torch_dtype": "bfloat16",
100
  "torchscript": false,
101
  "transformers_version": "4.40.1",
 
95
  "tie_word_embeddings": false,
96
  "tokenizer_class": null,
97
  "top_k": 50,
98
+ "top_p": 1.0,
99
  "torch_dtype": "bfloat16",
100
  "torchscript": false,
101
  "transformers_version": "4.40.1",
conversation.py CHANGED
@@ -330,13 +330,16 @@ def get_conv_template(name: str) -> Conversation:
330
  return conv_templates[name].copy()
331
 
332
 
333
- # Note that for inference, using the Hermes-2 and internlm2-chat templates is equivalent.
 
 
 
334
  register_conv_template(
335
  Conversation(
336
  name='Hermes-2',
337
  system_template='<|im_start|>system\n{system_message}',
338
  # note: The new system prompt was not used here to avoid changes in benchmark performance.
339
- # system_message='我是书生·万象,英文名是InternVL,是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。人工智能实验室致力于原始技术创新,开源开放,共享共创,推动科技进步和产业发展。',
340
  system_message='你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫InternVL, 是一个有用无害的人工智能助手。',
341
  roles=('<|im_start|>user\n', '<|im_start|>assistant\n'),
342
  sep_style=SeparatorStyle.MPT,
@@ -357,7 +360,7 @@ register_conv_template(
357
  name='internlm2-chat',
358
  system_template='<|im_start|>system\n{system_message}',
359
  # note: The new system prompt was not used here to avoid changes in benchmark performance.
360
- # system_message='我是书生·万象,英文名是InternVL,是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。人工智能实验室致力于原始技术创新,开源开放,共享共创,推动科技进步和产业发展。',
361
  system_message='你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫InternVL, 是一个有用无害的人工智能助手。',
362
  roles=('<|im_start|>user\n', '<|im_start|>assistant\n'),
363
  sep_style=SeparatorStyle.MPT,
@@ -376,7 +379,7 @@ register_conv_template(
376
  name='phi3-chat',
377
  system_template='<|system|>\n{system_message}',
378
  # note: The new system prompt was not used here to avoid changes in benchmark performance.
379
- # system_message='我是书生·万象,英文名是InternVL,是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。人工智能实验室致力于原始技术创新,开源开放,共享共创,推动科技进步和产业发展。',
380
  system_message='你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫InternVL, 是一个有用无害的人工智能助手。',
381
  roles=('<|user|>\n', '<|assistant|>\n'),
382
  sep_style=SeparatorStyle.MPT,
 
330
  return conv_templates[name].copy()
331
 
332
 
333
+ # Both Hermes-2 and internlm2-chat are chatml-format conversation templates. The difference
334
+ # is that during training, the preprocessing function for the Hermes-2 template doesn't add
335
+ # <s> at the beginning of the tokenized sequence, while the internlm2-chat template does.
336
+ # Therefore, they are completely equivalent during inference.
337
  register_conv_template(
338
  Conversation(
339
  name='Hermes-2',
340
  system_template='<|im_start|>system\n{system_message}',
341
  # note: The new system prompt was not used here to avoid changes in benchmark performance.
342
+ # system_message='我是书生·万象,英文名是InternVL,是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。',
343
  system_message='你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫InternVL, 是一个有用无害的人工智能助手。',
344
  roles=('<|im_start|>user\n', '<|im_start|>assistant\n'),
345
  sep_style=SeparatorStyle.MPT,
 
360
  name='internlm2-chat',
361
  system_template='<|im_start|>system\n{system_message}',
362
  # note: The new system prompt was not used here to avoid changes in benchmark performance.
363
+ # system_message='我是书生·万象,英文名是InternVL,是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。',
364
  system_message='你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫InternVL, 是一个有用无害的人工智能助手。',
365
  roles=('<|im_start|>user\n', '<|im_start|>assistant\n'),
366
  sep_style=SeparatorStyle.MPT,
 
379
  name='phi3-chat',
380
  system_template='<|system|>\n{system_message}',
381
  # note: The new system prompt was not used here to avoid changes in benchmark performance.
382
+ # system_message='我是书生·万象,英文名是InternVL,是由上海人工智能实验室及多家合作单位联合开发的多模态大语言模型。',
383
  system_message='你是由上海人工智能实验室联合商汤科技开发的书生多模态大模型,英文名叫InternVL, 是一个有用无害的人工智能助手。',
384
  roles=('<|user|>\n', '<|assistant|>\n'),
385
  sep_style=SeparatorStyle.MPT,
tokenization_internlm2_fast.py ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Copyright (c) The InternLM team and The HuggingFace Inc. team. All rights reserved.
2
+ #
3
+ # This code is based on transformers/src/transformers/models/llama/tokenization_llama_fast.py
4
+ #
5
+ # Licensed under the Apache License, Version 2.0 (the "License");
6
+ # you may not use this file except in compliance with the License.
7
+ # You may obtain a copy of the License at
8
+ #
9
+ # http://www.apache.org/licenses/LICENSE-2.0
10
+ #
11
+ # Unless required by applicable law or agreed to in writing, software
12
+ # distributed under the License is distributed on an "AS IS" BASIS,
13
+ # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14
+ # See the License for the specific language governing permissions and
15
+ # limitations under the License.
16
+
17
+ """Tokenization Fast class for InternLM."""
18
+ import os
19
+ from shutil import copyfile
20
+ from typing import Any, Dict, Optional, Tuple
21
+
22
+ from tokenizers import Tokenizer, decoders, normalizers, processors
23
+ from tokenizers.models import BPE
24
+ from transformers.convert_slow_tokenizer import (SLOW_TO_FAST_CONVERTERS,
25
+ SentencePieceExtractor,
26
+ SpmConverter)
27
+ from transformers.tokenization_utils_fast import PreTrainedTokenizerFast
28
+ from transformers.utils import logging
29
+
30
+ from .tokenization_internlm2 import InternLM2Tokenizer
31
+
32
+ logger = logging.get_logger(__name__)
33
+
34
+ VOCAB_FILES_NAMES = {'vocab_file': './tokenizer.model'}
35
+
36
+
37
+ # Modified from transformers.convert_slow_tokenizer.LlamaConverter
38
+ class InternLM2Converter(SpmConverter):
39
+ handle_byte_fallback = True
40
+
41
+ def vocab(self, proto):
42
+ vocab = [
43
+ ('<unk>', 0.0),
44
+ ('<s>', 0.0),
45
+ ('</s>', 0.0),
46
+ ]
47
+ vocab += [(piece.piece, piece.score) for piece in proto.pieces[3:]]
48
+ return vocab
49
+
50
+ def unk_id(self, proto):
51
+ unk_id = 0
52
+ return unk_id
53
+
54
+ def decoder(self, replacement, add_prefix_space):
55
+ return decoders.Sequence(
56
+ [
57
+ decoders.Replace('▁', ' '),
58
+ decoders.ByteFallback(),
59
+ decoders.Fuse(),
60
+ decoders.Strip(content=' ', left=1),
61
+ ]
62
+ )
63
+
64
+ def tokenizer(self, proto):
65
+ model_type = proto.trainer_spec.model_type
66
+ vocab_scores = self.vocab(proto)
67
+ # special tokens
68
+ added_tokens = self.original_tokenizer.added_tokens_decoder
69
+ for i in range(len(vocab_scores)):
70
+ piece, score = vocab_scores[i]
71
+ if i in added_tokens:
72
+ vocab_scores[i] = (added_tokens[i].content, score)
73
+ if model_type == 1:
74
+ raise RuntimeError('InternLM2 is supposed to be a BPE model!')
75
+
76
+ elif model_type == 2:
77
+ _, merges = SentencePieceExtractor(self.original_tokenizer.vocab_file).extract(vocab_scores)
78
+ bpe_vocab = {word: i for i, (word, _score) in enumerate(vocab_scores)}
79
+ tokenizer = Tokenizer(
80
+ BPE(bpe_vocab, merges, unk_token=proto.trainer_spec.unk_piece, fuse_unk=True, byte_fallback=True)
81
+ )
82
+ tokenizer.add_special_tokens(
83
+ [ added_token for index, added_token in added_tokens.items()]
84
+ )
85
+ else:
86
+ raise Exception(
87
+ "You're trying to run a `Unigram` model but you're file was trained with a different algorithm"
88
+ )
89
+
90
+ return tokenizer
91
+
92
+ def normalizer(self, proto):
93
+ normalizers_list = []
94
+ if proto.normalizer_spec.add_dummy_prefix:
95
+ normalizers_list.append(normalizers.Prepend(prepend='▁'))
96
+ normalizers_list.append(normalizers.Replace(pattern=' ', content='▁'))
97
+ return normalizers.Sequence(normalizers_list)
98
+
99
+ def pre_tokenizer(self, replacement, add_prefix_space):
100
+ return None
101
+
102
+
103
+ SLOW_TO_FAST_CONVERTERS['InternLM2Tokenizer'] = InternLM2Converter
104
+
105
+
106
+ # Modified from transformers.model.llama.tokenization_llama_fast.LlamaTokenizerFast -> InternLM2TokenizerFast
107
+ class InternLM2TokenizerFast(PreTrainedTokenizerFast):
108
+ vocab_files_names = VOCAB_FILES_NAMES
109
+ slow_tokenizer_class = InternLM2Tokenizer
110
+ padding_side = 'left'
111
+ model_input_names = ['input_ids', 'attention_mask']
112
+ _auto_class = 'AutoTokenizer'
113
+
114
+ def __init__(
115
+ self,
116
+ vocab_file,
117
+ unk_token='<unk>',
118
+ bos_token='<s>',
119
+ eos_token='</s>',
120
+ pad_token='</s>',
121
+ sp_model_kwargs: Optional[Dict[str, Any]] = None,
122
+ add_bos_token=True,
123
+ add_eos_token=False,
124
+ decode_with_prefix_space=False,
125
+ clean_up_tokenization_spaces=False,
126
+ **kwargs,
127
+ ):
128
+ super().__init__(
129
+ vocab_file=vocab_file,
130
+ unk_token=unk_token,
131
+ bos_token=bos_token,
132
+ eos_token=eos_token,
133
+ pad_token=pad_token,
134
+ sp_model_kwargs=sp_model_kwargs,
135
+ add_bos_token=add_bos_token,
136
+ add_eos_token=add_eos_token,
137
+ decode_with_prefix_space=decode_with_prefix_space,
138
+ clean_up_tokenization_spaces=clean_up_tokenization_spaces,
139
+ **kwargs,
140
+ )
141
+ self._add_bos_token = add_bos_token
142
+ self._add_eos_token = add_eos_token
143
+ self.update_post_processor()
144
+ self.vocab_file = vocab_file
145
+
146
+ @property
147
+ def can_save_slow_tokenizer(self) -> bool:
148
+ return os.path.isfile(self.vocab_file) if self.vocab_file else False
149
+
150
+ def update_post_processor(self):
151
+ """
152
+ Updates the underlying post processor with the current `bos_token` and `eos_token`.
153
+ """
154
+ bos = self.bos_token
155
+ bos_token_id = self.bos_token_id
156
+ if bos is None and self.add_bos_token:
157
+ raise ValueError('add_bos_token = True but bos_token = None')
158
+
159
+ eos = self.eos_token
160
+ eos_token_id = self.eos_token_id
161
+ if eos is None and self.add_eos_token:
162
+ raise ValueError('add_eos_token = True but eos_token = None')
163
+
164
+ single = f"{(bos+':0 ') if self.add_bos_token else ''}$A:0{(' '+eos+':0') if self.add_eos_token else ''}"
165
+ pair = f"{single}{(' '+bos+':1') if self.add_bos_token else ''} $B:1{(' '+eos+':1') if self.add_eos_token else ''}"
166
+
167
+ special_tokens = []
168
+ if self.add_bos_token:
169
+ special_tokens.append((bos, bos_token_id))
170
+ if self.add_eos_token:
171
+ special_tokens.append((eos, eos_token_id))
172
+ self._tokenizer.post_processor = processors.TemplateProcessing(
173
+ single=single, pair=pair, special_tokens=special_tokens
174
+ )
175
+
176
+ @property
177
+ def add_eos_token(self):
178
+ return self._add_eos_token
179
+
180
+ @property
181
+ def add_bos_token(self):
182
+ return self._add_bos_token
183
+
184
+ @add_eos_token.setter
185
+ def add_eos_token(self, value):
186
+ self._add_eos_token = value
187
+ self.update_post_processor()
188
+
189
+ @add_bos_token.setter
190
+ def add_bos_token(self, value):
191
+ self._add_bos_token = value
192
+ self.update_post_processor()
193
+
194
+ def save_vocabulary(self, save_directory: str, filename_prefix: Optional[str] = None) -> Tuple[str]:
195
+ if not self.can_save_slow_tokenizer:
196
+ raise ValueError(
197
+ 'Your fast tokenizer does not have the necessary information to save the vocabulary for a slow '
198
+ 'tokenizer.'
199
+ )
200
+
201
+ if not os.path.isdir(save_directory):
202
+ logger.error(f'Vocabulary path ({save_directory}) should be a directory')
203
+ return
204
+ out_vocab_file = os.path.join(
205
+ save_directory, (filename_prefix + '-' if filename_prefix else '') + VOCAB_FILES_NAMES['vocab_file']
206
+ )
207
+
208
+ if os.path.abspath(self.vocab_file) != os.path.abspath(out_vocab_file):
209
+ copyfile(self.vocab_file, out_vocab_file)
210
+
211
+ return (out_vocab_file,)