Text Generation
Transformers
Safetensors
English
llama
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
6a7e92e
1 Parent(s): aa73c8b

Initial GPTQ model commit

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -63,22 +63,22 @@ Each separate quant is in a different branch. See below for instructions on fet
63
 
64
  | Branch | Bits | Group Size | Act Order (desc_act) | File Size | ExLlama Compatible? | Made With | Description |
65
  | ------ | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | ----------- |
66
- | main | 4 | None | True | 36652374352.00 GB | True | AutoGPTQ | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
67
- | gptq-4bit-128g-actorder_False | 4 | 128 | False | 36.65 GB | True | AutoGPTQ | 4-bit, without Act Order and group size 128g. |
68
- | gptq-4bit-32g-actorder_True | 4 | 32 | True | Processing, coming soon | True | AutoGPTQ | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
69
  | gptq-4bit-64g-actorder_True | 4 | 64 | True | 37.99 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
70
- | gptq-4bit-128g-actorder_True | 4 | 128 | True | Processing, coming soon | True | AutoGPTQ | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
71
  | gptq-3bit--1g-actorder_True | 3 | None | True | 26.78 GB | False | AutoGPTQ | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
72
  | gptq-3bit-128g-actorder_False | 3 | 128 | False | 28.03 GB | False | AutoGPTQ | 3-bit, with group size 128g but no act-order. Slightly higher VRAM requirements than 3-bit None. |
73
  | gptq-3bit-128g-actorder_True | 3 | 128 | True | 28.03 GB | False | AutoGPTQ | 3-bit, with group size 128g and act-order. Higher quality than 128g-False but poor AutoGPTQ CUDA speed. |
74
- | gptq-3bit-64g-actorder_True | 3 | 64 | True | 29.30 GB | False | AutoGPTQ | 3-bit, with group size 64g and act-order. Highest quality 3-bit option. Poor AutoGPTQ CUDA speed. |
 
75
 
76
  ## How to download from branches
77
 
78
- - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/FreeWilly2-GPTQ:gptq-4bit-128g-actorder_False`
79
  - With Git, you can clone a branch with:
80
  ```
81
- git clone --branch gptq-4bit-128g-actorder_False https://huggingface.co/TheBloke/FreeWilly2-GPTQ`
82
  ```
83
  - In Python Transformers code, the branch is the `revision` parameter; see below.
84
 
@@ -90,7 +90,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
90
 
91
  1. Click the **Model tab**.
92
  2. Under **Download custom model or LoRA**, enter `TheBloke/FreeWilly2-GPTQ`.
93
- - To download from a specific branch, enter for example `TheBloke/FreeWilly2-GPTQ:gptq-4bit-128g-actorder_False`
94
  - see Provided Files above for the list of branches for each option.
95
  3. Click **Download**.
96
  4. The model will start downloading. Once it's finished it will say "Done"
@@ -132,7 +132,7 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
132
  To download from a specific branch, use the revision parameter, as in this example:
133
 
134
  model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
135
- revision="gptq-4bit-128g-actorder_False",
136
  model_basename=model_basename,
137
  use_safetensors=True,
138
  trust_remote_code=False,
 
63
 
64
  | Branch | Bits | Group Size | Act Order (desc_act) | File Size | ExLlama Compatible? | Made With | Description |
65
  | ------ | ---- | ---------- | -------------------- | --------- | ------------------- | --------- | ----------- |
66
+ | main | 4 | None | True | 35.33 GB | True | AutoGPTQ | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
67
+ | gptq-4bit-32g-actorder_True | 4 | 32 | True | 40.66 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
68
+ | gptq-4bit-128g-actorder_True | 4 | 128 | True | 36.65 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
69
  | gptq-4bit-64g-actorder_True | 4 | 64 | True | 37.99 GB | True | AutoGPTQ | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
 
70
  | gptq-3bit--1g-actorder_True | 3 | None | True | 26.78 GB | False | AutoGPTQ | 3-bit, with Act Order and no group size. Lowest possible VRAM requirements. May be lower quality than 3-bit 128g. |
71
  | gptq-3bit-128g-actorder_False | 3 | 128 | False | 28.03 GB | False | AutoGPTQ | 3-bit, with group size 128g but no act-order. Slightly higher VRAM requirements than 3-bit None. |
72
  | gptq-3bit-128g-actorder_True | 3 | 128 | True | 28.03 GB | False | AutoGPTQ | 3-bit, with group size 128g and act-order. Higher quality than 128g-False but poor AutoGPTQ CUDA speed. |
73
+ | gptq-3bit-64g-actorder_True | 3 | 64 | True | 29.30 GB | False | AutoGPTQ | 3-bit, with group size 64g and act-order. Highest quality 3-bit option. Poor AutoGPTQ CUDA speed. |
74
+ | gptq-4bit-128g-actorder_False | 4 | 128 | False | 36.65 GB | True | AutoGPTQ | 4-bit, without Act Order and group size 128g. |
75
 
76
  ## How to download from branches
77
 
78
+ - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/FreeWilly2-GPTQ:gptq-4bit-32g-actorder_True`
79
  - With Git, you can clone a branch with:
80
  ```
81
+ git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/FreeWilly2-GPTQ`
82
  ```
83
  - In Python Transformers code, the branch is the `revision` parameter; see below.
84
 
 
90
 
91
  1. Click the **Model tab**.
92
  2. Under **Download custom model or LoRA**, enter `TheBloke/FreeWilly2-GPTQ`.
93
+ - To download from a specific branch, enter for example `TheBloke/FreeWilly2-GPTQ:gptq-4bit-32g-actorder_True`
94
  - see Provided Files above for the list of branches for each option.
95
  3. Click **Download**.
96
  4. The model will start downloading. Once it's finished it will say "Done"
 
132
  To download from a specific branch, use the revision parameter, as in this example:
133
 
134
  model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
135
+ revision="gptq-4bit-32g-actorder_True",
136
  model_basename=model_basename,
137
  use_safetensors=True,
138
  trust_remote_code=False,