TheBloke commited on
Commit
7202dc3
1 Parent(s): eb57837

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -39
README.md CHANGED
@@ -49,7 +49,7 @@ AWQ is an efficient, accurate and blazing-fast low-bit weight quantization metho
49
 
50
  These are experimental first AWQs for the brand-new model format, Mistral.
51
 
52
- As of September 29th 2023, they are supported by AutoAWQ, and vLLM (version 0.2).
53
 
54
  To use from AutoAWQ requires installing both AutoAWQ and Transformers from Github. More details are below.
55
 
@@ -86,44 +86,6 @@ Models are released as sharded safetensors files.
86
 
87
  <!-- README_AWQ.md-provided-files end -->
88
 
89
- <!-- README_AWQ.md-use-from-vllm start -->
90
- ## Serving this model from vLLM
91
-
92
- Make sure you are using vLLM version 0.2.
93
-
94
- Documentation on installing and using vLLM [can be found here](https://vllm.readthedocs.io/en/latest/).
95
-
96
- - When using vLLM as a server, pass the `--quantization awq` parameter, for example:
97
-
98
- ```shell
99
- python3 python -m vllm.entrypoints.api_server --model TheBloke/Mistral-7B-v0.1-AWQ --quantization awq --dtype float16
100
- ```
101
-
102
- When using vLLM from Python code, pass the `quantization=awq` parameter, for example:
103
-
104
- ```python
105
- from vllm import LLM, SamplingParams
106
-
107
- prompts = [
108
- "Hello, my name is",
109
- "The president of the United States is",
110
- "The capital of France is",
111
- "The future of AI is",
112
- ]
113
- sampling_params = SamplingParams(temperature=0.8, top_p=0.95)
114
-
115
- llm = LLM(model="TheBloke/Mistral-7B-v0.1-AWQ", quantization="awq", dtype="float16")
116
-
117
- outputs = llm.generate(prompts, sampling_params)
118
-
119
- # Print the outputs.
120
- for output in outputs:
121
- prompt = output.prompt
122
- generated_text = output.outputs[0].text
123
- print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
124
- ```
125
- <!-- README_AWQ.md-use-from-vllm start -->
126
-
127
  <!-- README_AWQ.md-use-from-python start -->
128
  ## How to use this AWQ model from Python code
129
 
 
49
 
50
  These are experimental first AWQs for the brand-new model format, Mistral.
51
 
52
+ As of September 29th 2023, they are only supported by AutoAWQ (version 0.1.1+)
53
 
54
  To use from AutoAWQ requires installing both AutoAWQ and Transformers from Github. More details are below.
55
 
 
86
 
87
  <!-- README_AWQ.md-provided-files end -->
88
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89
  <!-- README_AWQ.md-use-from-python start -->
90
  ## How to use this AWQ model from Python code
91