TheBloke
/

Mistral-7B-OpenOrca-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

fix: quantize param in TGI example

#8

by jlzhou - opened Oct 16, 2023

base: refs/heads/main

←

from: refs/pr/8

Discussion Files changed

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -208,7 +208,7 @@ It's recommended to use TGI version 1.1.0 or later. The official Docker containe
 Example Docker parameters:
 ```shell
---model-id TheBloke/Mistral-7B-OpenOrca-GPTQ --port 3000 --quantize awq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
 ```
 Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):

 Example Docker parameters:
 ```shell
+--model-id TheBloke/Mistral-7B-OpenOrca-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
 ```
 Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):