TheBloke commited on
Commit
82a8118
1 Parent(s): 95537d2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -23
README.md CHANGED
@@ -1,11 +1,13 @@
1
  ---
2
  datasets:
3
  - tiiuae/falcon-refinedweb
 
4
  language:
5
  - en
6
  inference: false
7
  ---
8
 
 
9
  <div style="width: 100%;">
10
  <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
11
  </div>
@@ -17,7 +19,7 @@ inference: false
17
  <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
18
  </div>
19
  </div>
20
-
21
 
22
  # Falcon-40B-Instruct GPTQ
23
 
@@ -29,29 +31,27 @@ It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQi
29
 
30
  Please note this is an experimental GPTQ model. Support for it is currently quite limited.
31
 
32
- It is also expected to be **VERY SLOW**. This is unavoidable at the moment, but is being looked at.
33
 
34
- To use it you will require:
35
 
36
- 1. AutoGPTQ, from the latest `main` branch and compiled with `pip install .`
37
- 2. `pip install einops`
38
 
39
- You can then use it immediately from Python code - see example code below - or from text-generation-webui.
40
 
41
- ## AutoGPTQ
42
 
43
- To install AutoGPTQ please follow these instructions:
44
  ```
45
  git clone https://github.com/PanQiWei/AutoGPTQ
46
  cd AutoGPTQ
47
  pip install .
48
  ```
49
 
50
- These steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
51
 
52
  ## text-generation-webui
53
 
54
- There is also provisional AutoGPTQ support in text-generation-webui.
55
 
56
  This requires text-generation-webui as of commit 204731952ae59d79ea3805a425c73dd171d943c3.
57
 
@@ -78,14 +78,9 @@ In this repo you can see two `.py` files - these are the files that get executed
78
 
79
  ## Simple Python example code
80
 
81
- To run this code you need to install AutoGPTQ from source:
82
- ```
83
- git clone https://github.com/PanQiWei/AutoGPTQ
84
- cd AutoGPTQ
85
- pip install . # This step requires CUDA toolkit installed
86
- ```
87
- And install einops:
88
  ```
 
89
  pip install einops
90
  ```
91
 
@@ -96,7 +91,7 @@ from transformers import AutoTokenizer
96
  from auto_gptq import AutoGPTQForCausalLM
97
 
98
  # Download the model from HF and store it locally, then reference its location here:
99
- quantized_model_dir = "/path/to/falcon7b-instruct-gptq"
100
 
101
  from transformers import AutoTokenizer
102
  tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=False)
@@ -113,13 +108,13 @@ print(tokenizer.decode(output[0]))
113
 
114
  ## Provided files
115
 
116
- **gptq_model-4bit.safetensors**
117
 
118
  This will work with AutoGPTQ as of commit `3cb1bf5` (`3cb1bf5a6d43a06dc34c6442287965d1838303d3`)
119
 
120
- It was created with no groupsize to reduce VRAM requirements as much as possible, with `desc_act` (act-order) to increase inference quality.
121
 
122
- * `gptq_model-4bit.safetensors`
123
  * Works only with latest AutoGPTQ CUDA, compiled from source as of commit `3cb1bf5`
124
  * At this time it does not work with AutoGPTQ Triton, but support will hopefully be added in time.
125
  * Works with text-generation-webui using `--autogptq --trust_remote_code`
@@ -127,6 +122,7 @@ It was created with no groupsize to reduce VRAM requirements as much as possible
127
  * Does not work with any version of GPTQ-for-LLaMa
128
  * Parameters: Groupsize = 64. No act-order.
129
 
 
130
  ## Discord
131
 
132
  For further support, and discussions on these models and AI in general, join us at: [TheBloke AI's Discord server](https://discord.gg/UBgz4VXf)
@@ -144,9 +140,11 @@ Donaters will get priority support on any and all AI/LLM/model questions, plus o
144
  * Patreon: https://patreon.com/TheBlokeAI
145
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
146
 
147
- **Patreon special mentions**: Aemon Algiz; Talal Aujan; Jonathan Leane; Illia Dulskyi; Khalefa Al-Ahmad;
148
- senxiiz. Thank you all, and to all my other generous patrons and donaters.
149
 
 
 
 
150
  # ✨ Original model card: Falcon-40B-Instruct
151
 
152
  # ✨ Falcon-40B-Instruct
 
1
  ---
2
  datasets:
3
  - tiiuae/falcon-refinedweb
4
+ license: apache-2.0
5
  language:
6
  - en
7
  inference: false
8
  ---
9
 
10
+ <!-- header start -->
11
  <div style="width: 100%;">
12
  <img src="https://i.imgur.com/EBdldam.jpg" alt="TheBlokeAI" style="width: 100%; min-width: 400px; display: block; margin: auto;">
13
  </div>
 
19
  <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
20
  </div>
21
  </div>
22
+ <!-- header end -->
23
 
24
  # Falcon-40B-Instruct GPTQ
25
 
 
31
 
32
  Please note this is an experimental GPTQ model. Support for it is currently quite limited.
33
 
34
+ It is also expected to be **VERY SLOW**. This is currently unavoidable, but is being looked at.
35
 
36
+ ## AutoGPTQ
37
 
38
+ AutoGPTQ is required: `pip install auto-gptq`
 
39
 
40
+ AutoGPTQ provides pre-compiled wheels for Windows and Linux, with CUDA toolkit 11.7 or 11.8.
41
 
42
+ If you are running CUDA toolkit 12.x, you will need to compile your own by following these instructions:
43
 
 
44
  ```
45
  git clone https://github.com/PanQiWei/AutoGPTQ
46
  cd AutoGPTQ
47
  pip install .
48
  ```
49
 
50
+ These manual steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
51
 
52
  ## text-generation-webui
53
 
54
+ There is provisional AutoGPTQ support in text-generation-webui.
55
 
56
  This requires text-generation-webui as of commit 204731952ae59d79ea3805a425c73dd171d943c3.
57
 
 
78
 
79
  ## Simple Python example code
80
 
81
+ To run this code you need to install AutoGPTQ and einops:
 
 
 
 
 
 
82
  ```
83
+ pip install auto-gptq
84
  pip install einops
85
  ```
86
 
 
91
  from auto_gptq import AutoGPTQForCausalLM
92
 
93
  # Download the model from HF and store it locally, then reference its location here:
94
+ quantized_model_dir = "/path/to/falcon40b-instruct-gptq"
95
 
96
  from transformers import AutoTokenizer
97
  tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir, use_fast=False)
 
108
 
109
  ## Provided files
110
 
111
+ **gptq_model-4bit-64g.safetensors**
112
 
113
  This will work with AutoGPTQ as of commit `3cb1bf5` (`3cb1bf5a6d43a06dc34c6442287965d1838303d3`)
114
 
115
+ It was created with groupsize 64 to give higher inference quality, and without `desc_act` (act-order) to increase inference speed.
116
 
117
+ * `gptq_model-4bit-64g.safetensors`
118
  * Works only with latest AutoGPTQ CUDA, compiled from source as of commit `3cb1bf5`
119
  * At this time it does not work with AutoGPTQ Triton, but support will hopefully be added in time.
120
  * Works with text-generation-webui using `--autogptq --trust_remote_code`
 
122
  * Does not work with any version of GPTQ-for-LLaMa
123
  * Parameters: Groupsize = 64. No act-order.
124
 
125
+ <!-- footer start -->
126
  ## Discord
127
 
128
  For further support, and discussions on these models and AI in general, join us at: [TheBloke AI's Discord server](https://discord.gg/UBgz4VXf)
 
140
  * Patreon: https://patreon.com/TheBlokeAI
141
  * Ko-Fi: https://ko-fi.com/TheBlokeAI
142
 
143
+ **Patreon special mentions**: Aemon Algiz; Johann-Peter Hartmann; Talal Aujan; Jonathan Leane; Illia Dulskyi; Khalefa Al-Ahmad; senxiiz; Sebastain Graf; Eugene Pentland; Nikolai Manek; Luke Pendergrass.
 
144
 
145
+ Thank you to all my generous patrons and donaters.
146
+ <!-- footer end -->
147
+
148
  # ✨ Original model card: Falcon-40B-Instruct
149
 
150
  # ✨ Falcon-40B-Instruct