Update README.md
Browse files
README.md
CHANGED
@@ -115,7 +115,7 @@ Limitations: Although we have made efforts to ensure the safety of the model dur
|
|
115 |
|
116 |
## Quick Start
|
117 |
|
118 |
-
We provide an example code to run InternVL2-40B using `transformers`.
|
119 |
|
120 |
> Please use transformers>=4.37.2 to ensure the model works normally.
|
121 |
|
@@ -150,10 +150,6 @@ model = AutoModel.from_pretrained(
|
|
150 |
trust_remote_code=True).eval()
|
151 |
```
|
152 |
|
153 |
-
#### BNB 4-bit Quantization
|
154 |
-
|
155 |
-
> **⚠️ Warning:** Due to significant quantization errors with BNB 4-bit quantization on InternViT-6B, the model may produce nonsensical outputs and fail to understand images. Therefore, please avoid using BNB 4-bit quantization.
|
156 |
-
|
157 |
#### Multiple GPUs
|
158 |
|
159 |
The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors.
|
@@ -443,7 +439,7 @@ response, history = model.chat(tokenizer, pixel_values, question, generation_con
|
|
443 |
num_patches_list=num_patches_list, history=None, return_history=True)
|
444 |
print(f'User: {question}\nAssistant: {response}')
|
445 |
|
446 |
-
question = 'Describe this video in detail.
|
447 |
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
|
448 |
num_patches_list=num_patches_list, history=history, return_history=True)
|
449 |
print(f'User: {question}\nAssistant: {response}')
|
@@ -502,7 +498,7 @@ from lmdeploy.vl import load_image
|
|
502 |
|
503 |
model = 'OpenGVLab/InternVL2-40B'
|
504 |
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
|
505 |
-
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
506 |
response = pipe(('describe this image', image))
|
507 |
print(response.text)
|
508 |
```
|
@@ -521,7 +517,7 @@ from lmdeploy.vl import load_image
|
|
521 |
from lmdeploy.vl.constants import IMAGE_TOKEN
|
522 |
|
523 |
model = 'OpenGVLab/InternVL2-40B'
|
524 |
-
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
525 |
|
526 |
image_urls=[
|
527 |
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
|
@@ -543,7 +539,7 @@ from lmdeploy import pipeline, TurbomindEngineConfig
|
|
543 |
from lmdeploy.vl import load_image
|
544 |
|
545 |
model = 'OpenGVLab/InternVL2-40B'
|
546 |
-
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
547 |
|
548 |
image_urls=[
|
549 |
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
|
@@ -563,7 +559,7 @@ from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
|
|
563 |
from lmdeploy.vl import load_image
|
564 |
|
565 |
model = 'OpenGVLab/InternVL2-40B'
|
566 |
-
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
|
567 |
|
568 |
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
|
569 |
gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
|
@@ -578,7 +574,7 @@ print(sess.response.text)
|
|
578 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
579 |
|
580 |
```shell
|
581 |
-
lmdeploy serve api_server OpenGVLab/InternVL2-40B --backend turbomind --server-port 23333
|
582 |
```
|
583 |
|
584 |
To use the OpenAI-style interface, you need to install OpenAI:
|
|
|
115 |
|
116 |
## Quick Start
|
117 |
|
118 |
+
We provide an example code to run `InternVL2-40B` using `transformers`.
|
119 |
|
120 |
> Please use transformers>=4.37.2 to ensure the model works normally.
|
121 |
|
|
|
150 |
trust_remote_code=True).eval()
|
151 |
```
|
152 |
|
|
|
|
|
|
|
|
|
153 |
#### Multiple GPUs
|
154 |
|
155 |
The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors.
|
|
|
439 |
num_patches_list=num_patches_list, history=None, return_history=True)
|
440 |
print(f'User: {question}\nAssistant: {response}')
|
441 |
|
442 |
+
question = 'Describe this video in detail.'
|
443 |
response, history = model.chat(tokenizer, pixel_values, question, generation_config,
|
444 |
num_patches_list=num_patches_list, history=history, return_history=True)
|
445 |
print(f'User: {question}\nAssistant: {response}')
|
|
|
498 |
|
499 |
model = 'OpenGVLab/InternVL2-40B'
|
500 |
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
|
501 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
|
502 |
response = pipe(('describe this image', image))
|
503 |
print(response.text)
|
504 |
```
|
|
|
517 |
from lmdeploy.vl.constants import IMAGE_TOKEN
|
518 |
|
519 |
model = 'OpenGVLab/InternVL2-40B'
|
520 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
|
521 |
|
522 |
image_urls=[
|
523 |
'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
|
|
|
539 |
from lmdeploy.vl import load_image
|
540 |
|
541 |
model = 'OpenGVLab/InternVL2-40B'
|
542 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
|
543 |
|
544 |
image_urls=[
|
545 |
"https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
|
|
|
559 |
from lmdeploy.vl import load_image
|
560 |
|
561 |
model = 'OpenGVLab/InternVL2-40B'
|
562 |
+
pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
|
563 |
|
564 |
image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
|
565 |
gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
|
|
|
574 |
LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
|
575 |
|
576 |
```shell
|
577 |
+
lmdeploy serve api_server OpenGVLab/InternVL2-40B --backend turbomind --server-port 23333 --tp 2
|
578 |
```
|
579 |
|
580 |
To use the OpenAI-style interface, you need to install OpenAI:
|