TheBloke commited on
Commit
b260f6a
1 Parent(s): 4e14067

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +132 -11
README.md CHANGED
@@ -11,8 +11,34 @@ license: apache-2.0
11
  model_creator: Mistral AI_
12
  model_name: Mixtral 8X7B Instruct v0.1
13
  model_type: mixtral
14
- prompt_template: '[INST] {prompt} [/INST] '
 
 
15
  quantized_by: TheBloke
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
  ---
17
  <!-- markdownlint-disable MD041 -->
18
 
@@ -40,15 +66,12 @@ quantized_by: TheBloke
40
  <!-- description start -->
41
  # Description
42
 
43
- This repo contains **EXPERIMENTAL** GPTQ model files for [Mistral AI_'s Mixtral 8X7B Instruct v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
44
-
45
- ## Requires AutoGPTQ PR + transformers 4.36.0
46
 
47
- These files were made with, and will currently only work with, this AutoGPTQ PR: https://github.com/LaaZa/AutoGPTQ/tree/Mixtral-fix
48
-
49
- To test, please build AutoGPTQ from source using that PR. You also need Transformers version 4.36.0, released December 11th.
50
-
51
- Transformers support has just arrived also via two PRs - and is expected in main Transformers + Optimum tomorrow (Dec 12th).
52
 
53
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
54
 
@@ -56,7 +79,7 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
56
  <!-- repositories-available start -->
57
  ## Repositories available
58
 
59
- * AWQ coming soon
60
  * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ)
61
  * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF)
62
  * [Mistral AI_'s original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
@@ -67,9 +90,22 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
67
 
68
  ```
69
  [INST] {prompt} [/INST]
 
70
  ```
 
71
  <!-- prompt-template end -->
72
 
 
 
 
 
 
 
 
 
 
 
 
73
  <!-- README_GPTQ.md-provided-files start -->
74
  ## Provided files, and GPTQ parameters
75
 
@@ -174,7 +210,11 @@ Note that using Git with HF repos is strongly discouraged. It will be much slowe
174
  <!-- README_GPTQ.md-text-generation-webui start -->
175
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
176
 
177
- **WILL CURRENTLY ONLY WORK WITH AUTOGPTQ LOADER, WITH AUTOGPTQ COMPILED FROM PR LISTED ABOVE**
 
 
 
 
178
 
179
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
180
 
@@ -199,6 +239,87 @@ It is strongly recommended to use the text-generation-webui one-click-installers
199
 
200
  <!-- README_GPTQ.md-text-generation-webui end -->
201
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
202
 
203
  <!-- footer start -->
204
  <!-- 200823 -->
 
11
  model_creator: Mistral AI_
12
  model_name: Mixtral 8X7B Instruct v0.1
13
  model_type: mixtral
14
+ prompt_template: '[INST] {prompt} [/INST]
15
+
16
+ '
17
  quantized_by: TheBloke
18
+ widget:
19
+ - output:
20
+ text: 'Arr, shiver me timbers! Ye have a llama on yer lawn, ye say? Well, that
21
+ be a new one for me! Here''s what I''d suggest, arr:
22
+
23
+
24
+ 1. Firstly, ensure yer safety. Llamas may look gentle, but they can be protective
25
+ if they feel threatened.
26
+
27
+ 2. Try to make the area less appealing to the llama. Remove any food sources
28
+ or water that might be attracting it.
29
+
30
+ 3. Contact local animal control or a wildlife rescue organization. They be the
31
+ experts and can provide humane ways to remove the llama from yer property.
32
+
33
+ 4. If ye have any experience with animals, you could try to gently herd the
34
+ llama towards a nearby field or open space. But be careful, arr!
35
+
36
+
37
+ Remember, arr, it be important to treat the llama with respect and care. It
38
+ be a creature just trying to survive, like the rest of us.'
39
+ text: '[INST] You are a pirate chatbot who always responds with Arr and pirate speak!
40
+
41
+ There''s a llama on my lawn, how can I get rid of him? [/INST]'
42
  ---
43
  <!-- markdownlint-disable MD041 -->
44
 
 
66
  <!-- description start -->
67
  # Description
68
 
69
+ This repo contains GPTQ model files for [Mistral AI_'s Mixtral 8X7B Instruct v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1).
 
 
70
 
71
+ Mixtral GPTQs currently require:
72
+ * Transformers 4.36.0 or later
73
+ * either, AutoGPTQ 0.6 compiled from source, or
74
+ * Transformers 4.37.0.dev0 compiled from Github with: `pip3 install git+https://github.com/huggingface/transformers`
 
75
 
76
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
77
 
 
79
  <!-- repositories-available start -->
80
  ## Repositories available
81
 
82
+ * [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-AWQ)
83
  * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ)
84
  * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF)
85
  * [Mistral AI_'s original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
 
90
 
91
  ```
92
  [INST] {prompt} [/INST]
93
+
94
  ```
95
+
96
  <!-- prompt-template end -->
97
 
98
+
99
+
100
+ <!-- README_GPTQ.md-compatible clients start -->
101
+ ## Known compatible clients / servers
102
+
103
+ GPTQ models are currently supported on Linux (NVidia/AMD) and Windows (NVidia only). macOS users: please use GGUF models.
104
+
105
+ Mixtral GPTQs currently have special requirements - see Description above.
106
+
107
+ <!-- README_GPTQ.md-compatible clients end -->
108
+
109
  <!-- README_GPTQ.md-provided-files start -->
110
  ## Provided files, and GPTQ parameters
111
 
 
210
  <!-- README_GPTQ.md-text-generation-webui start -->
211
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
212
 
213
+ **NOTE**: Requires:
214
+
215
+ * Transformers 4.36.0, or Transformers 4.37.0.dev0 from Github
216
+ * Either AutoGPTQ 0.6 compiled from source and `Loader: AutoGPTQ`,
217
+ * or, `Loader: Transformers`, if you installed Transformers from Github: `pip3 install git+https://github.com/huggingface/transformers`
218
 
219
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
220
 
 
239
 
240
  <!-- README_GPTQ.md-text-generation-webui end -->
241
 
242
+ <!-- README_GPTQ.md-use-from-tgi start -->
243
+ ## Serving this model from Text Generation Inference (TGI)
244
+
245
+ Not currently supported for Mixtral models.
246
+
247
+ <!-- README_GPTQ.md-use-from-tgi end -->
248
+ <!-- README_GPTQ.md-use-from-python start -->
249
+ ## Python code example: inference from this GPTQ model
250
+
251
+ ### Install the necessary packages
252
+
253
+ Requires: Transformers 4.37.0.dev0 from Github, Optimum 1.16.0 or later, and AutoGPTQ 0.5.1 or later.
254
+
255
+ ```shell
256
+ pip3 install --upgrade "git+https://github.com/huggingface/transformers" optimum
257
+ # If using PyTorch 2.1 + CUDA 12.x:
258
+ pip3 install --upgrade auto-gptq
259
+ # or, if using PyTorch 2.1 + CUDA 11.x:
260
+ pip3 install --upgrade auto-gptq --extra-index-url https://huggingface.github.io/autogptq-index/whl/cu118/
261
+ ```
262
+
263
+ If you are using PyTorch 2.0, you will need to install AutoGPTQ from source. Likewise if you have problems with the pre-built wheels, you should try building from source:
264
+
265
+ ```shell
266
+ pip3 uninstall -y auto-gptq
267
+ git clone https://github.com/PanQiWei/AutoGPTQ
268
+ cd AutoGPTQ
269
+ DISABLE_QIGEN=1 pip3 install .
270
+ ```
271
+
272
+ ### Example Python code
273
+
274
+ ```python
275
+ from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
276
+
277
+ model_name_or_path = "TheBloke/Mixtral-8x7B-Instruct-v0.1-GPTQ"
278
+ # To use a different branch, change revision
279
+ # For example: revision="gptq-4bit-128g-actorder_True"
280
+ model = AutoModelForCausalLM.from_pretrained(model_name_or_path,
281
+ device_map="auto",
282
+ trust_remote_code=False,
283
+ revision="main")
284
+
285
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
286
+
287
+ prompt = "Write a story about llamas"
288
+ system_message = "You are a story writing assistant"
289
+ prompt_template=f'''[INST] {prompt} [/INST]
290
+ '''
291
+
292
+ print("\n\n*** Generate:")
293
+
294
+ input_ids = tokenizer(prompt_template, return_tensors='pt').input_ids.cuda()
295
+ output = model.generate(inputs=input_ids, temperature=0.7, do_sample=True, top_p=0.95, top_k=40, max_new_tokens=512)
296
+ print(tokenizer.decode(output[0]))
297
+
298
+ # Inference can also be done using transformers' pipeline
299
+
300
+ print("*** Pipeline:")
301
+ pipe = pipeline(
302
+ "text-generation",
303
+ model=model,
304
+ tokenizer=tokenizer,
305
+ max_new_tokens=512,
306
+ do_sample=True,
307
+ temperature=0.7,
308
+ top_p=0.95,
309
+ top_k=40,
310
+ repetition_penalty=1.1
311
+ )
312
+
313
+ print(pipe(prompt_template)[0]['generated_text'])
314
+ ```
315
+ <!-- README_GPTQ.md-use-from-python end -->
316
+
317
+ <!-- README_GPTQ.md-compatibility start -->
318
+ ## Compatibility
319
+
320
+ The files provided are tested to work with AutoGPTQ 0.6 (compiled from source) and Transformers 4.37.0 (installed from Github).
321
+
322
+ <!-- README_GPTQ.md-compatibility end -->
323
 
324
  <!-- footer start -->
325
  <!-- 200823 -->