TheBloke commited on
Commit
81de15e
1 Parent(s): 390c99d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -9
README.md CHANGED
@@ -40,13 +40,15 @@ This repo contains GPTQ model files for [Mistral AI's Mistral 7B v0.1](https://h
40
 
41
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
42
 
43
- ### GPTQs will work in Transformers only - and requires Transformers from Github
 
 
44
 
45
  At the time of writing (September 28th), AutoGPTQ has not yet added support for the new Mistral models.
46
 
47
- These GPTQs were made directly from Transformers, and so can only be loaded via the Transformers interface. They can't be loaded directly from AutoGPTQ.
48
 
49
- In addition, you will need to install Transformers from Github, with:
50
  ```
51
  pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
52
  ```
@@ -98,10 +100,10 @@ These files were made with Transformers 4.34.0.dev0, from commit 72958fcd3c98a7a
98
 
99
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
100
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
101
- | [main](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ/tree/main) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.16 GB | No | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
102
- | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.57 GB | No | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
103
- | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.16 GB | No | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. |
104
- | [gptq-8bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ/tree/gptq-8bit-32g-actorder_True) | 8 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.57 GB | No | 8-bit, with group size 32g and Act Order for maximum inference quality. |
105
 
106
  <!-- README_GPTQ.md-provided-files end -->
107
 
@@ -175,9 +177,9 @@ Note that using Git with HF repos is strongly discouraged. It will be much slowe
175
  <!-- README_GPTQ.md-text-generation-webui start -->
176
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
177
 
178
- NOTE: These models haven't been tested in text-generation-webui. But I hope they will work.
179
 
180
- You will need to use **Loader: Transformers**. AutoGPTQ will not work. I don't know about ExLlama - it might work as this model is so similar to Llama; let me know if it does!
181
 
182
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
183
 
 
40
 
41
  Multiple GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
42
 
43
+ ### GPTQs will work in ExLlama, or via Transformers (requiring Transformers from Github)
44
+
45
+ These models are confirmed to work with ExLlama v1.
46
 
47
  At the time of writing (September 28th), AutoGPTQ has not yet added support for the new Mistral models.
48
 
49
+ These GPTQs were made directly from Transformers, and so can be loaded via the Transformers interface. They can't be loaded directly from AutoGPTQ.
50
 
51
+ To load them via Transformers, you will need to install Transformers from Github, with:
52
  ```
53
  pip3 install git+https://github.com/huggingface/transformers.git@72958fcd3c98a7afdc61f953aa58c544ebda2f79
54
  ```
 
100
 
101
  | Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
102
  | ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
103
+ | [main](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ/tree/main) | 4 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.16 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. |
104
+ | [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.57 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. |
105
+ | [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.16 GB | Yes | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. |
106
+ | [gptq-8bit-32g-actorder_True](https://huggingface.co/TheBloke/Mistral-7B-v0.1-GPTQ/tree/gptq-8bit-32g-actorder_True) | 8 | 32 | Yes | 0.1 | [wikitext](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test) | 32768 | 4.57 GB | Yes | 8-bit, with group size 32g and Act Order for maximum inference quality. |
107
 
108
  <!-- README_GPTQ.md-provided-files end -->
109
 
 
177
  <!-- README_GPTQ.md-text-generation-webui start -->
178
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
179
 
180
+ These models are confirmed to work via the ExLlama Loader in text-generation-webui.
181
 
182
+ Use **Loader: ExLlama** - or Transformers may work too. AutoGPTQ will not work.
183
 
184
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
185