fix: quantize param in TGI example
#8
by
jlzhou
- opened
README.md
CHANGED
@@ -208,7 +208,7 @@ It's recommended to use TGI version 1.1.0 or later. The official Docker containe
|
|
208 |
Example Docker parameters:
|
209 |
|
210 |
```shell
|
211 |
-
--model-id TheBloke/Mistral-7B-OpenOrca-GPTQ --port 3000 --quantize
|
212 |
```
|
213 |
|
214 |
Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):
|
|
|
208 |
Example Docker parameters:
|
209 |
|
210 |
```shell
|
211 |
+
--model-id TheBloke/Mistral-7B-OpenOrca-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
|
212 |
```
|
213 |
|
214 |
Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):
|