TheBloke commited on
Commit
6ae1e4a
1 Parent(s): 41d428b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +12 -10
README.md CHANGED
@@ -40,13 +40,15 @@ This repo contains GPTQ model files for [Mistral AI's Mistral 7B Instruct v0.1](
40
 
41
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
42
 
43
- ### GPTQs will work in Transformers only - and requires Transformers from Github
 
 
44
 
45
  At the time of writing (September 28th), AutoGPTQ has not yet added support for the new Mistral models.
46
 
47
- These GPTQs were made directly from Transformers, and so can only be loaded via the Transformers interface. They can't be loaded directly from AutoGPTQ.
48
 
49
- In addition, you will need to install Transformers from Github, with:
50
  ```
51
  pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
52
  ```
@@ -96,10 +98,10 @@ These files were made with Transformers 4.34.0.dev0, from commit 72958fcd3c98a7a
96
 
97
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
98
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
99
- | [main](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/main) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.16 GB | No | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
100
- | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.57 GB | No | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
101
- | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 7.68 GB | No | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. |
102
- | [gptq-8bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/gptq-8bit-32g-actorder_True) | 8 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 8.17 GB | No | 8-bit, with group size 32g and Act Order for maximum inference quality. |
103
 
104
  <!-- README_GPTQ.md-provided-files end -->
105
 
@@ -173,9 +175,9 @@ Note that using Git with HF repos is strongly discouraged. It will be much slowe
173
  <!-- README_GPTQ.md-text-generation-webui start -->
174
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
175
 
176
- NOTE: These models haven't been tested in text-generation-webui. But I hope they will work.
177
 
178
- You will need to use **Loader: Transformers**. AutoGPTQ will not work. I don't know about ExLlama - it might work as this model is so similar to Llama; let me know if it does!
179
 
180
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
181
 
@@ -265,7 +267,7 @@ print(pipe(prompt_template)[0]['generated_text'])
265
  <!-- README_GPTQ.md-compatibility start -->
266
  ## Compatibility
267
 
268
- The files provided are only tested to work with Transformers 4.34.0.dev0 as of commit 72958fcd3c98a7afdc61f953aa58c544ebda2f79.
269
 
270
  <!-- README_GPTQ.md-compatibility end -->
271
 
 
40
 
41
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
42
 
43
+ ### GPTQs will work in ExLlama, or via Transformers (requiring Transformers from Github)
44
+
45
+ These models are confirmed to work with ExLlama v1.
46
 
47
  At the time of writing (September 28th), AutoGPTQ has not yet added support for the new Mistral models.
48
 
49
+ These GPTQs were made directly from Transformers, and so can be loaded via the Transformers interface. They can't be loaded directly from AutoGPTQ.
50
 
51
+ To load them via Transformers, you will need to install Transformers from Github, with:
52
  ```
53
  pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
54
  ```
 
98
 
99
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
100
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
101
+ | [main](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/main) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.16 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
102
+ | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.57 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
103
+ | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 7.68 GB | Yes | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. |
104
+ | [gptq-8bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GPTQ/tree/gptq-8bit-32g-actorder_True) | 8 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 8.17 GB | Yes | 8-bit, with group size 32g and Act Order for maximum inference quality. |
105
 
106
  <!-- README_GPTQ.md-provided-files end -->
107
 
 
175
  <!-- README_GPTQ.md-text-generation-webui start -->
176
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
177
 
178
+ These models are confirmed to work via the ExLlama Loader in text-generation-webui.
179
 
180
+ Use **Loader: ExLlama** - or Transformers may work too. AutoGPTQ will not work.
181
 
182
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
183
 
 
267
  <!-- README_GPTQ.md-compatibility start -->
268
  ## Compatibility
269
 
270
+ The files provided are only tested to work with ExLlama v1, and Transformers 4.34.0.dev0 as of commit 72958fcd3c98a7afdc61f953aa58c544ebda2f79.
271
 
272
  <!-- README_GPTQ.md-compatibility end -->
273