TheBloke commited on
Commit
7a63247
1 Parent(s): e9abd05

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +18 -61
README.md CHANGED
@@ -49,14 +49,10 @@ quantized_by: TheBloke
49
 
50
  This repo contains GPTQ model files for [OpenOrca's Mixtral SlimOrca 8X7B](https://huggingface.co/Open-Orca/Mixtral-SlimOrca-8x7B).
51
 
52
- ## Requires AutoGPTQ PR + transformers 4.36.0
53
-
54
- These files were made with, and will currently only work with, AutoGPTQ 0.6 compiled from source. A full AutoGPTQ 0.6 release is coming very soon.
55
-
56
- You also need Transformers version 4.36.0, released December 11th.
57
-
58
- Transformers GPTQ support is also available by installing Transformers from Github - more details on this soon.
59
-
60
 
61
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
62
 
@@ -90,14 +86,8 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
90
 
91
  GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.
92
 
93
- These GPTQ models are known to work in the following inference servers/webuis.
94
 
95
- - [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
96
- - [KoboldAI United](https://github.com/henk717/koboldai)
97
- - [LoLLMS Web UI](https://github.com/ParisNeo/lollms-webui)
98
- - [Hugging Face Text Generation Inference (TGI)](https://github.com/huggingface/text-generation-inference)
99
-
100
- This may not be a complete list; if you know of others, please let me know!
101
  <!-- README_GPTQ.md-compatible clients end -->
102
 
103
  <!-- README_GPTQ.md-provided-files start -->
@@ -205,6 +195,12 @@ Note that using Git with HF repos is strongly discouraged. It will be much slowe
205
  <!-- README_GPTQ.md-text-generation-webui start -->
206
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
207
 
 
 
 
 
 
 
208
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
209
 
210
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
@@ -231,54 +227,18 @@ It is strongly recommended to use the text-generation-webui one-click-installers
231
  <!-- README_GPTQ.md-use-from-tgi start -->
232
  ## Serving this model from Text Generation Inference (TGI)
233
 
234
- It's recommended to use TGI version 1.1.0 or later. The official Docker container is: `ghcr.io/huggingface/text-generation-inference:1.1.0`
235
-
236
- Example Docker parameters:
237
 
238
- ```shell
239
- --model-id TheBloke/Mixtral-SlimOrca-8x7B-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
240
- ```
241
-
242
- Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):
243
-
244
- ```shell
245
- pip3 install huggingface-hub
246
- ```
247
-
248
- ```python
249
- from huggingface_hub import InferenceClient
250
-
251
- endpoint_url = "https://your-endpoint-url-here"
252
-
253
- prompt = "Tell me about AI"
254
- prompt_template=f'''<|im_start|>system
255
- {system_message}<|im_end|>
256
- <|im_start|>user
257
- {prompt}<|im_end|>
258
- <|im_start|>assistant
259
- '''
260
-
261
- client = InferenceClient(endpoint_url)
262
- response = client.text_generation(prompt,
263
- max_new_tokens=128,
264
- do_sample=True,
265
- temperature=0.7,
266
- top_p=0.95,
267
- top_k=40,
268
- repetition_penalty=1.1)
269
-
270
- print(f"Model output: {response}")
271
- ```
272
  <!-- README_GPTQ.md-use-from-tgi end -->
273
  <!-- README_GPTQ.md-use-from-python start -->
274
  ## Python code example: inference from this GPTQ model
275
 
276
  ### Install the necessary packages
277
 
278
- Requires: Transformers 4.33.0 or later, Optimum 1.12.0 or later, and AutoGPTQ 0.4.2 or later.
279
 
280
  ```shell
281
- pip3 install --upgrade transformers optimum
282
  # If using PyTorch 2.1 + CUDA 12.x:
283
  pip3 install --upgrade auto-gptq
284
  # or, if using PyTorch 2.1 + CUDA 11.x:
@@ -291,8 +251,7 @@ If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Lik
291
  pip3 uninstall -y auto-gptq
292
  git clone https://github.com/PanQiWei/AutoGPTQ
293
  cd AutoGPTQ
294
- git checkout v0.5.1
295
- pip3 install .
296
  ```
297
 
298
  ### Example Python code
@@ -310,7 +269,8 @@ model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
310
 
311
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
312
 
313
- prompt = "Tell me about AI"
 
314
  prompt_template=f'''<|im_start|>system
315
  {system_message}<|im_end|>
316
  <|im_start|>user
@@ -346,11 +306,8 @@ print(pipe(prompt_template)[0]['generated_text'])
346
  <!-- README_GPTQ.md-compatibility start -->
347
  ## Compatibility
348
 
349
- The files provided are tested to work with Transformers. For non-Mistral models, AutoGPTQ can also be used directly.
350
-
351
- [ExLlama](https://github.com/turboderp/exllama) is compatible with Llama and Mistral models in 4-bit. Please see the Provided Files table above for per-file compatibility.
352
 
353
- For a list of clients/servers, please see "Known compatible clients / servers", above.
354
  <!-- README_GPTQ.md-compatibility end -->
355
 
356
  <!-- footer start -->
 
49
 
50
  This repo contains GPTQ model files for [OpenOrca's Mixtral SlimOrca 8X7B](https://huggingface.co/Open-Orca/Mixtral-SlimOrca-8x7B).
51
 
52
+ Mixtral GPTQs currently require:
53
+ * Transformers 4.36.0 or later
54
+ * either, AutoGPTQ 0.6 compiled from source, or
55
+ * Transformers 4.37.0.dev0 compiled from Github with: `pip3 install git+https://github.com/huggingface/transformers`
 
 
 
 
56
 
57
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
58
 
 
86
 
87
  GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.
88
 
89
+ Mixtral GPTQs currently have special requirements - see Description above.
90
 
 
 
 
 
 
 
91
  <!-- README_GPTQ.md-compatible clients end -->
92
 
93
  <!-- README_GPTQ.md-provided-files start -->
 
195
  <!-- README_GPTQ.md-text-generation-webui start -->
196
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
197
 
198
+ **NOTE**: Requires:
199
+
200
+ * Transformers 4.36.0, or Transformers 4.37.0.dev0 from Github
201
+ * Either AutoGPTQ 0.6 compiled from source and `Loader: AutoGPTQ`,
202
+ * or, `Loader: Transformers`, if you installed Transformers from Github: `pip3 install git+https://github.com/huggingface/transformers`
203
+
204
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
205
 
206
  It is strongly recommended to use the text-generation-webui one-click-installers unless you're sure you know how to make a manual install.
 
227
  <!-- README_GPTQ.md-use-from-tgi start -->
228
  ## Serving this model from Text Generation Inference (TGI)
229
 
230
+ Not currently supported for Mixtral models.
 
 
231
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
232
  <!-- README_GPTQ.md-use-from-tgi end -->
233
  <!-- README_GPTQ.md-use-from-python start -->
234
  ## Python code example: inference from this GPTQ model
235
 
236
  ### Install the necessary packages
237
 
238
+ Requires: Transformers 4.37.0.dev0 from Github, Optimum 1.16.0 or later, and AutoGPTQ 0.5.1 or later.
239
 
240
  ```shell
241
+ pip3 install --upgrade "git+https://github.com/huggingface/transformers" optimum
242
  # If using PyTorch 2.1 + CUDA 12.x:
243
  pip3 install --upgrade auto-gptq
244
  # or, if using PyTorch 2.1 + CUDA 11.x:
 
251
  pip3 uninstall -y auto-gptq
252
  git clone https://github.com/PanQiWei/AutoGPTQ
253
  cd AutoGPTQ
254
+ DISABLE_QIGEN=1 pip3 install .
 
255
  ```
256
 
257
  ### Example Python code
 
269
 
270
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
271
 
272
+ prompt = "Write a story about llamas"
273
+ system_message = "You are a story writing assistant"
274
  prompt_template=f'''<|im_start|>system
275
  {system_message}<|im_end|>
276
  <|im_start|>user
 
306
  <!-- README_GPTQ.md-compatibility start -->
307
  ## Compatibility
308
 
309
+ The files provided are tested to work with AutoGPTQ 0.6 (compiled from source) and Transformers 4.37.0 (installed from Github).
 
 
310
 
 
311
  <!-- README_GPTQ.md-compatibility end -->
312
 
313
  <!-- footer start -->