TheBloke commited on
Commit
01ee7bf
1 Parent(s): ef6cacd

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -53
README.md CHANGED
@@ -49,6 +49,11 @@ quantized_by: TheBloke
49
 
50
  This repo contains GPTQ model files for [OpenOrca's Mixtral SlimOrca 8X7B](https://huggingface.co/Open-Orca/Mixtral-SlimOrca-8x7B).
51
 
 
 
 
 
 
52
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
53
 
54
  <!-- description end -->
@@ -81,14 +86,8 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
81
 
82
  GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.
83
 
84
- These GPTQ models are known to work in the following inference servers/webuis.
85
-
86
- - [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
87
- - [KoboldAI United](https://github.com/henk717/koboldai)
88
- - [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui)
89
- - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
90
 
91
- This may not be a complete list; if you know of others, please let me know!
92
  <!-- README_GPTQ.md-compatible clients end -->
93
 
94
  <!-- README_GPTQ.md-provided-files start -->
@@ -196,6 +195,12 @@ Note that using Git with HF repos is strongly discouraged. It will be much slowe
196
  <!-- README_GPTQ.md-text-generation-webui start -->
197
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
198
 
 
 
 
 
 
 
199
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
200
 
201
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
@@ -222,54 +227,18 @@ It is strongly recommended to use the text-generation-webui one-click-installers
222
  <!-- README_GPTQ.md-use-from-tgi start -->
223
  ## Serving this model from Text Generation Inference (TGI)
224
 
225
- It's recommended to use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggingface/text-generation-inference:1.1.0`
226
-
227
- Example Docker parameters:
228
-
229
- ```shell
230
- --model-id TheBloke/Mixtral-SlimOrca-8x7B-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
231
- ```
232
-
233
- Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):
234
-
235
- ```shell
236
- pip3 install huggingface-hub
237
- ```
238
-
239
- ```python
240
- from huggingface_hub import InferenceClient
241
-
242
- endpoint_url = "https://your-endpoint-url-here"
243
 
244
- prompt = "Tell me about AI"
245
- prompt_template=f'''<|im_start|>system
246
- {system_message}<|im_end|>
247
- <|im_start|>user
248
- {prompt}<|im_end|>
249
- <|im_start|>assistant
250
- '''
251
-
252
- client = InferenceClient(endpoint_url)
253
- response = client.text_generation(prompt,
254
- max_new_tokens=128,
255
- do_sample=True,
256
- temperature=0.7,
257
- top_p=0.95,
258
- top_k=40,
259
- repetition_penalty=1.1)
260
-
261
- print(f"Model output: {response}")
262
- ```
263
  <!-- README_GPTQ.md-use-from-tgi end -->
264
  <!-- README_GPTQ.md-use-from-python start -->
265
  ## Python code example: inference from this GPTQ model
266
 
267
  ### Install the necessary packages
268
 
269
- Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
270
 
271
  ```shell
272
- pip3 install --upgrade transformers optimum
273
  # If using PyTorch 2.1 + CUDA 12.x:
274
  pip3 install --upgrade auto-gptq
275
  # or, if using PyTorch 2.1 + CUDA 11.x:
@@ -282,8 +251,7 @@ If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Lik
282
  pip3 uninstall -y auto-gptq
283
  git clone https://github.com/PanQiWei/AutoGPTQ
284
  cd AutoGPTQ
285
- git checkout v0.5.1
286
- pip3 install .
287
  ```
288
 
289
  ### Example Python code
@@ -301,7 +269,8 @@ model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
301
 
302
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
303
 
304
- prompt = "Tell me about AI"
 
305
  prompt_template=f'''<|im_start|>system
306
  {system_message}<|im_end|>
307
  <|im_start|>user
@@ -337,11 +306,8 @@ print(pipe(prompt_template)[0]['generated_text'])
337
  <!-- README_GPTQ.md-compatibility start -->
338
  ## Compatibility
339
 
340
- The files provided are tested to work with Transformers. For non-Mistral models, AutoGPTQ can also be used directly.
341
-
342
- [ExLlama](https://github.com/turboderp/exllama) is compatible with Llama and Mistral models in 4-bit. Please see the Provided Files table above for per-file compatibility.
343
 
344
- For a list of clients/servers, please see "Known compatible clients / servers", above.
345
  <!-- README_GPTQ.md-compatibility end -->
346
 
347
  <!-- footer start -->
 
49
 
50
  This repo contains GPTQ model files for [OpenOrca's Mixtral SlimOrca 8X7B](https://huggingface.co/Open-Orca/Mixtral-SlimOrca-8x7B).
51
 
52
+ Mixtral GPTQs currently require:
53
+ * Transformers 4.36.0 or later
54
+ * either, AutoGPTQ 0.6 compiled from source, or
55
+ * Transformers 4.37.0.dev0 compiled from Github with: `pip3 install git+https://github.com/huggingface/transformers`
56
+
57
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
58
 
59
  <!-- description end -->
 
86
 
87
  GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.
88
 
89
+ Mixtral GPTQs currently have special requirements - see Description above.
 
 
 
 
 
90
 
 
91
  <!-- README_GPTQ.md-compatible clients end -->
92
 
93
  <!-- README_GPTQ.md-provided-files start -->
 
195
  <!-- README_GPTQ.md-text-generation-webui start -->
196
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
197
 
198
+ **NOTE**: Requires:
199
+
200
+ * Transformers 4.36.0, or Transformers 4.37.0.dev0 from Github
201
+ * Either AutoGPTQ 0.6 compiled from source and `Loader: AutoGPTQ`,
202
+ * or, `Loader: Transformers`, if you installed Transformers from Github: `pip3 install git+https://github.com/huggingface/transformers`
203
+
204
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
205
 
206
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
 
227
  <!-- README_GPTQ.md-use-from-tgi start -->
228
  ## Serving this model from Text Generation Inference (TGI)
229
 
230
+ Not currently supported for Mixtral models.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
231
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
232
  <!-- README_GPTQ.md-use-from-tgi end -->
233
  <!-- README_GPTQ.md-use-from-python start -->
234
  ## Python code example: inference from this GPTQ model
235
 
236
  ### Install the necessary packages
237
 
238
+ Requires: Transformers 4.37.0.dev0 from Github, Optimum 1.16.0 or later, and AutoGPTQ 0.5.1 or later.
239
 
240
  ```shell
241
+ pip3 install --upgrade "git+https://github.com/huggingface/transformers" optimum
242
  # If using PyTorch 2.1 + CUDA 12.x:
243
  pip3 install --upgrade auto-gptq
244
  # or, if using PyTorch 2.1 + CUDA 11.x:
 
251
  pip3 uninstall -y auto-gptq
252
  git clone https://github.com/PanQiWei/AutoGPTQ
253
  cd AutoGPTQ
254
+ DISABLE_QIGEN=1 pip3 install .
 
255
  ```
256
 
257
  ### Example Python code
 
269
 
270
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
271
 
272
+ prompt = "Write a story about llamas"
273
+ system_message = "You are a story writing assistant"
274
  prompt_template=f'''<|im_start|>system
275
  {system_message}<|im_end|>
276
  <|im_start|>user
 
306
  <!-- README_GPTQ.md-compatibility start -->
307
  ## Compatibility
308
 
309
+ The files provided are tested to work with AutoGPTQ 0.6 (compiled from source) and Transformers 4.37.0 (installed from Github).
 
 
310
 
 
311
  <!-- README_GPTQ.md-compatibility end -->
312
 
313
  <!-- footer start -->