czczup commited on
Commit
79c0c57
1 Parent(s): fc807c9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -11
README.md CHANGED
@@ -115,7 +115,7 @@ Limitations: Although we have made efforts to ensure the safety of the model dur
115
 
116
  ## Quick Start
117
 
118
- We provide an example code to run InternVL2-40B using `transformers`.
119
 
120
  > Please use transformers>=4.37.2 to ensure the model works normally.
121
 
@@ -150,10 +150,6 @@ model = AutoModel.from_pretrained(
150
  trust_remote_code=True).eval()
151
  ```
152
 
153
- #### BNB 4-bit Quantization
154
-
155
- > **⚠️ Warning:** Due to significant quantization errors with BNB 4-bit quantization on InternViT-6B, the model may produce nonsensical outputs and fail to understand images. Therefore, please avoid using BNB 4-bit quantization.
156
-
157
  #### Multiple GPUs
158
 
159
  The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors.
@@ -443,7 +439,7 @@ response, history = model.chat(tokenizer, pixel_values, question, generation_con
443
  num_patches_list=num_patches_list, history=None, return_history=True)
444
  print(f'User: {question}\nAssistant: {response}')
445
 
446
- question = 'Describe this video in detail. Don\'t repeat.'
447
  response, history = model.chat(tokenizer, pixel_values, question, generation_config,
448
  num_patches_list=num_patches_list, history=history, return_history=True)
449
  print(f'User: {question}\nAssistant: {response}')
@@ -502,7 +498,7 @@ from lmdeploy.vl import load_image
502
 
503
  model = 'OpenGVLab/InternVL2-40B'
504
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
505
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
506
  response = pipe(('describe this image', image))
507
  print(response.text)
508
  ```
@@ -521,7 +517,7 @@ from lmdeploy.vl import load_image
521
  from lmdeploy.vl.constants import IMAGE_TOKEN
522
 
523
  model = 'OpenGVLab/InternVL2-40B'
524
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
525
 
526
  image_urls=[
527
  'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
@@ -543,7 +539,7 @@ from lmdeploy import pipeline, TurbomindEngineConfig
543
  from lmdeploy.vl import load_image
544
 
545
  model = 'OpenGVLab/InternVL2-40B'
546
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
547
 
548
  image_urls=[
549
  "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
@@ -563,7 +559,7 @@ from lmdeploy import pipeline, TurbomindEngineConfig, GenerationConfig
563
  from lmdeploy.vl import load_image
564
 
565
  model = 'OpenGVLab/InternVL2-40B'
566
- pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192))
567
 
568
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
569
  gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
@@ -578,7 +574,7 @@ print(sess.response.text)
578
  LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
579
 
580
  ```shell
581
- lmdeploy serve api_server OpenGVLab/InternVL2-40B --backend turbomind --server-port 23333
582
  ```
583
 
584
  To use the OpenAI-style interface, you need to install OpenAI:
 
115
 
116
  ## Quick Start
117
 
118
+ We provide an example code to run `InternVL2-40B` using `transformers`.
119
 
120
  > Please use transformers>=4.37.2 to ensure the model works normally.
121
 
 
150
  trust_remote_code=True).eval()
151
  ```
152
 
 
 
 
 
153
  #### Multiple GPUs
154
 
155
  The reason for writing the code this way is to avoid errors that occur during multi-GPU inference due to tensors not being on the same device. By ensuring that the first and last layers of the large language model (LLM) are on the same device, we prevent such errors.
 
439
  num_patches_list=num_patches_list, history=None, return_history=True)
440
  print(f'User: {question}\nAssistant: {response}')
441
 
442
+ question = 'Describe this video in detail.'
443
  response, history = model.chat(tokenizer, pixel_values, question, generation_config,
444
  num_patches_list=num_patches_list, history=history, return_history=True)
445
  print(f'User: {question}\nAssistant: {response}')
 
498
 
499
  model = 'OpenGVLab/InternVL2-40B'
500
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg')
501
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
502
  response = pipe(('describe this image', image))
503
  print(response.text)
504
  ```
 
517
  from lmdeploy.vl.constants import IMAGE_TOKEN
518
 
519
  model = 'OpenGVLab/InternVL2-40B'
520
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
521
 
522
  image_urls=[
523
  'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg',
 
539
  from lmdeploy.vl import load_image
540
 
541
  model = 'OpenGVLab/InternVL2-40B'
542
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
543
 
544
  image_urls=[
545
  "https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg",
 
559
  from lmdeploy.vl import load_image
560
 
561
  model = 'OpenGVLab/InternVL2-40B'
562
+ pipe = pipeline(model, backend_config=TurbomindEngineConfig(session_len=8192, tp=2))
563
 
564
  image = load_image('https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/demo/resources/human-pose.jpg')
565
  gen_config = GenerationConfig(top_k=40, top_p=0.8, temperature=0.8)
 
574
  LMDeploy's `api_server` enables models to be easily packed into services with a single command. The provided RESTful APIs are compatible with OpenAI's interfaces. Below are an example of service startup:
575
 
576
  ```shell
577
+ lmdeploy serve api_server OpenGVLab/InternVL2-40B --backend turbomind --server-port 23333 --tp 2
578
  ```
579
 
580
  To use the OpenAI-style interface, you need to install OpenAI: