TheBloke commited on
Commit
006052d
1 Parent(s): 56d5f5f

Initial GPTQ model commit

Browse files
Files changed (1) hide show
  1. README.md +11 -4
README.md CHANGED
@@ -49,12 +49,17 @@ Each separate quant is in a different branch. See below for instructions on fet
49
  | Branch | Filename | Bits | Group Size | Act Order (desc_act) | File Size | ExLlama Compatible? | Made With | Use |
50
  | ------ | -------- | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | --- |
51
  | main | open-llama-7b-v2-open-instruct-GPTQ-4bit-128g.no-act.order.safetensors | 4 | 128 | False | 4.00 GB | True | GPTQ-for-LLaMa | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. |
52
- | gptq-4bit-32g-actorder_True | gptq_model-4bit-32g.safetensors | 4 | 32 | True | 4.28 GB | True | AutoGPTQ | Group size 32g gives highest possible inference quality, with maximum VRAM usage. |
53
- | gptq-4bit-64g-actorder_True | gptq_model-4bit-64g.safetensors | 4 | 64 | True | 4.02 GB | True | AutoGPTQ | Group size 64g uses less VRAM, but with slightly lower accuracy. |
54
- | gptq-4bit-128g-actorder_True | gptq_model-4bit-128g.safetensors | 4 | 128 | True | 3.90 GB | True | AutoGPTQ | Group size 128g uses even less VRAM, but with slightly lower accuracy. |
55
- | gptq-8bit--1g-actorder_True | gptq_model-8bit--1g.safetensors | 8 | None | True | 7.01 GB | False | AutoGPTQ | Group size none has the least VRAM usage, but the lowest accuracy. |
56
 
57
 
 
 
 
 
 
58
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
59
 
60
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
@@ -63,6 +68,8 @@ It is strongly recommended to use the text-generation-webui one-click-installers
63
 
64
  1. Click the **Model tab**.
65
  2. Under **Download custom model or LoRA**, enter `TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
 
 
66
  3. Click **Download**.
67
  4. The model will start downloading. Once it's finished it will say "Done"
68
  5. In the top left, click the refresh icon next to **Model**.
 
49
  | Branch | Filename | Bits | Group Size | Act Order (desc_act) | File Size | ExLlama Compatible? | Made With | Use |
50
  | ------ | -------- | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | --- |
51
  | main | open-llama-7b-v2-open-instruct-GPTQ-4bit-128g.no-act.order.safetensors | 4 | 128 | False | 4.00 GB | True | GPTQ-for-LLaMa | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. |
52
+ | gptq-4bit-32g-actorder_True | gptq_model-4bit-32g.safetensors | 4 | 32 | True | 4.28 GB | True | AutoGPTQ | 4-bit, with Act Order. Group size 32g gives highest possible inference quality, with maximum VRAM usage. |
53
+ | gptq-4bit-64g-actorder_True | gptq_model-4bit-64g.safetensors | 4 | 64 | True | 4.02 GB | True | AutoGPTQ | 4-bit, with Act Order. Group size 64g uses less VRAM, but with slightly lower accuracy. |
54
+ | gptq-4bit-128g-actorder_True | gptq_model-4bit-128g.safetensors | 4 | 128 | True | 3.90 GB | True | AutoGPTQ | 4-bit, with Act Order. Group size 128g uses even less VRAM, but with slightly lower accuracy. |
55
+ | gptq-8bit--1g-actorder_True | gptq_model-8bit--1g.safetensors | 8 | None | True | 7.01 GB | False | AutoGPTQ | 8-bit, with Act Order. Group size none is used to lower VRAM requirements and to improve compatibility. |
56
 
57
 
58
+ ## How to download from branches
59
+
60
+ - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
61
+ - With Git, you can clone with: `git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
62
+
63
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
64
 
65
  Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 
68
 
69
  1. Click the **Model tab**.
70
  2. Under **Download custom model or LoRA**, enter `TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
71
+ - To download from a specific branch, enter for example `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
72
+ - see Provided Files above for the list of branches for each file type.
73
  3. Click **Download**.
74
  4. The model will start downloading. Once it's finished it will say "Done"
75
  5. In the top left, click the refresh icon next to **Model**.