czczup commited on
Commit
b252962
β€’
1 Parent(s): bba800e

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -3,11 +3,19 @@ license: mit
3
  pipeline_tag: image-text-to-text
4
  ---
5
 
 
 
 
 
 
 
 
 
6
  <div align="center">
7
  <img src="https://raw.githubusercontent.com/InternLM/lmdeploy/0be9e7ab6fe9a066cfb0a09d0e0c8d2e28435e58/resources/lmdeploy-logo.svg" width="450"/>
8
  </div>
9
 
10
- # INT4 Weight-only Quantization and Deployment (W4A16)
11
 
12
  LMDeploy adopts [AWQ](https://arxiv.org/abs/2306.00978) algorithm for 4bit weight-only quantization. By developed the high-performance cuda kernel, the 4bit quantized model inference achieves up to 2.4x faster than FP16.
13
 
@@ -34,7 +42,7 @@ This article comprises the following sections:
34
 
35
  <!-- tocstop -->
36
 
37
- ## Inference
38
 
39
  Trying the following codes, you can perform the batched offline inference with the quantized model:
40
 
@@ -56,7 +64,7 @@ print(response.text)
56
 
57
  For more information about the pipeline parameters, please refer to [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/pipeline.md).
58
 
59
- ## Service
60
 
61
  To deploy InternVL2 as an API, please configure the chat template config first. Create the following JSON file `chat_template.json`.
62
 
 
3
  pipeline_tag: image-text-to-text
4
  ---
5
 
6
+ # InternVL2-2B-AWQ
7
+
8
+ [\[πŸ“‚ GitHub\]](https://github.com/OpenGVLab/InternVL) [\[πŸ†• Blog\]](https://internvl.github.io/blog/) [\[πŸ“œ InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[πŸ“œ InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821)
9
+
10
+ [\[πŸ—¨οΈ Chat Demo\]](https://internvl.opengvlab.com/) [\[πŸ€— HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[πŸš€ Quick Start\]](#quick-start) [\[πŸ“– 中文解读\]](https://zhuanlan.zhihu.com/p/706547971) \[🌟 [ι­”ζ­η€ΎεŒΊ](https://modelscope.cn/organization/OpenGVLab) | [教程](https://mp.weixin.qq.com/s/OUaVLkxlk1zhFb1cvMCFjg) \]
11
+
12
+ ## Introduction
13
+
14
  <div align="center">
15
  <img src="https://raw.githubusercontent.com/InternLM/lmdeploy/0be9e7ab6fe9a066cfb0a09d0e0c8d2e28435e58/resources/lmdeploy-logo.svg" width="450"/>
16
  </div>
17
 
18
+ ### INT4 Weight-only Quantization and Deployment (W4A16)
19
 
20
  LMDeploy adopts [AWQ](https://arxiv.org/abs/2306.00978) algorithm for 4bit weight-only quantization. By developed the high-performance cuda kernel, the 4bit quantized model inference achieves up to 2.4x faster than FP16.
21
 
 
42
 
43
  <!-- tocstop -->
44
 
45
+ ### Inference
46
 
47
  Trying the following codes, you can perform the batched offline inference with the quantized model:
48
 
 
64
 
65
  For more information about the pipeline parameters, please refer to [here](https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/pipeline.md).
66
 
67
+ ### Service
68
 
69
  To deploy InternVL2 as an API, please configure the chat template config first. Create the following JSON file `chat_template.json`.
70