Initial GPTQ model commit
Browse files
README.md
CHANGED
@@ -45,10 +45,12 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
|
|
45 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference (deprecated)](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGML)
|
46 |
* [Meta's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-fp16)
|
47 |
|
48 |
-
## Prompt template:
|
49 |
|
50 |
```
|
51 |
-
|
|
|
|
|
52 |
```
|
53 |
|
54 |
## Provided files and GPTQ parameters
|
@@ -74,12 +76,12 @@ All GPTQ files are made with AutoGPTQ.
|
|
74 |
|
75 |
| Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
|
76 |
| ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
|
77 |
-
| [main](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/main) | 4 | 128 | No | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) |
|
78 |
-
| [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) |
|
79 |
-
| [gptq-4bit-64g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-4bit-64g-actorder_True) | 4 | 64 | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) |
|
80 |
-
| [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) |
|
81 |
-
| [gptq-8bit--1g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-8bit--1g-actorder_True) | 8 | None | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) |
|
82 |
-
| [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) |
|
83 |
|
84 |
## How to download from branches
|
85 |
|
@@ -139,7 +141,7 @@ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
|
|
139 |
|
140 |
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
141 |
use_safetensors=True,
|
142 |
-
trust_remote_code=
|
143 |
device="cuda:0",
|
144 |
use_triton=use_triton,
|
145 |
quantize_config=None)
|
@@ -151,13 +153,15 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
|
151 |
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
152 |
revision="gptq-4bit-32g-actorder_True",
|
153 |
use_safetensors=True,
|
154 |
-
trust_remote_code=
|
155 |
device="cuda:0",
|
156 |
quantize_config=None)
|
157 |
"""
|
158 |
|
159 |
prompt = "Tell me about AI"
|
160 |
-
prompt_template=f'''
|
|
|
|
|
161 |
'''
|
162 |
|
163 |
print("\n\n*** Generate:")
|
@@ -214,7 +218,7 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
|
|
214 |
|
215 |
**Special thanks to**: Aemon Algiz.
|
216 |
|
217 |
-
**Patreon special mentions**:
|
218 |
|
219 |
|
220 |
Thank you to all my generous patrons and donaters!
|
|
|
45 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference (deprecated)](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GGML)
|
46 |
* [Meta's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-fp16)
|
47 |
|
48 |
+
## Prompt template: CodeLlama
|
49 |
|
50 |
```
|
51 |
+
[INST] Write code to solve the following coding problem that obeys the constraints and passes the example test cases. Please wrap your code answer using ```:
|
52 |
+
{prompt}
|
53 |
+
[/INST]
|
54 |
```
|
55 |
|
56 |
## Provided files and GPTQ parameters
|
|
|
76 |
|
77 |
| Branch | Bits | GS | Act Order | Damp % | GPTQ Dataset | Seq Len | Size | ExLlama | Desc |
|
78 |
| ------ | ---- | -- | --------- | ------ | ------------ | ------- | ---- | ------- | ---- |
|
79 |
+
| [main](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/main) | 4 | 128 | No | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) | 4096 | 3.90 GB | Yes | Most compatible option. Good inference speed in AutoGPTQ and GPTQ-for-LLaMa. Lower inference quality than other options. |
|
80 |
+
| [gptq-4bit-32g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-4bit-32g-actorder_True) | 4 | 32 | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) | 4096 | 4.28 GB | Yes | 4-bit, with Act Order and group size 32g. Gives highest possible inference quality, with maximum VRAM usage. Poor AutoGPTQ CUDA speed. |
|
81 |
+
| [gptq-4bit-64g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-4bit-64g-actorder_True) | 4 | 64 | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) | 4096 | 4.02 GB | Yes | 4-bit, with Act Order and group size 64g. Uses less VRAM than 32g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
|
82 |
+
| [gptq-4bit-128g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-4bit-128g-actorder_True) | 4 | 128 | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) | 4096 | 3.90 GB | Yes | 4-bit, with Act Order and group size 128g. Uses even less VRAM than 64g, but with slightly lower accuracy. Poor AutoGPTQ CUDA speed. |
|
83 |
+
| [gptq-8bit--1g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-8bit--1g-actorder_True) | 8 | None | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) | 4096 | 7.01 GB | No | 8-bit, with Act Order. No group size, to lower VRAM requirements and to improve AutoGPTQ speed. |
|
84 |
+
| [gptq-8bit-128g-actorder_True](https://huggingface.co/TheBloke/CodeLlama-7B-Instruct-GPTQ/tree/gptq-8bit-128g-actorder_True) | 8 | 128 | Yes | 0.1 | [Evol Instruct Code](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1) | 4096 | 7.16 GB | No | 8-bit, with group size 128g for higher inference quality and with Act Order for even higher accuracy. Poor AutoGPTQ CUDA speed. |
|
85 |
|
86 |
## How to download from branches
|
87 |
|
|
|
141 |
|
142 |
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
143 |
use_safetensors=True,
|
144 |
+
trust_remote_code=False,
|
145 |
device="cuda:0",
|
146 |
use_triton=use_triton,
|
147 |
quantize_config=None)
|
|
|
153 |
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
154 |
revision="gptq-4bit-32g-actorder_True",
|
155 |
use_safetensors=True,
|
156 |
+
trust_remote_code=False,
|
157 |
device="cuda:0",
|
158 |
quantize_config=None)
|
159 |
"""
|
160 |
|
161 |
prompt = "Tell me about AI"
|
162 |
+
prompt_template=f'''[INST] Write code to solve the following coding problem that obeys the constraints and passes the example test cases. Please wrap your code answer using ```:
|
163 |
+
{prompt}
|
164 |
+
[/INST]
|
165 |
'''
|
166 |
|
167 |
print("\n\n*** Generate:")
|
|
|
218 |
|
219 |
**Special thanks to**: Aemon Algiz.
|
220 |
|
221 |
+
**Patreon special mentions**: Kacper Wikieł, knownsqashed, Leonard Tan, Asp the Wyvern, Daniel P. Andersen, Luke Pendergrass, Stanislav Ovsiannikov, RoA, Dave, Ai Maven, Kalila, Will Dee, Imad Khwaja, Nitin Borwankar, Joseph William Delisle, Tony Hughes, Cory Kujawski, Rishabh Srivastava, Russ Johnson, Stephen Murray, Lone Striker, Johann-Peter Hartmann, Elle, J, Deep Realms, SuperWojo, Raven Klaugh, Sebastain Graf, ReadyPlayerEmma, Alps Aficionado, Mano Prime, Derek Yates, Gabriel Puliatti, Mesiah Bishop, Magnesian, Sean Connelly, biorpg, Iucharbius, Olakabola, Fen Risland, Space Cruiser, theTransient, Illia Dulskyi, Thomas Belote, Spencer Kim, Pieter, John Detwiler, Fred von Graf, Michael Davis, Swaroop Kallakuri, subjectnull, Clay Pascal, Subspace Studios, Chris Smitley, Enrico Ros, usrbinkat, Steven Wood, alfie_i, David Ziegler, Willem Michiel, Matthew Berman, Andrey, Pyrater, Jeffrey Morgan, vamX, LangChain4j, Luke @flexchar, Trenton Dambrowitz, Pierre Kircher, Alex, Sam, James Bentley, Edmond Seymore, Eugene Pentland, Pedro Madruga, Rainer Wilmers, Dan Guido, Nathan LeClaire, Spiking Neurons AB, Talal Aujan, zynix, Artur Olbinski, Michael Levine, 阿明, K, John Villwock, Nikolai Manek, Femi Adebogun, senxiiz, Deo Leter, NimbleBox.ai, Viktor Bowallius, Geoffrey Montalvo, Mandus, Ajan Kanaga, ya boyyy, Jonathan Leane, webtim, Brandon Frisco, danny, Alexandros Triantafyllidis, Gabriel Tamborski, Randy H, terasurfer, Vadim, Junyu Yang, Vitor Caleffi, Chadd, transmissions 11
|
222 |
|
223 |
|
224 |
Thank you to all my generous patrons and donaters!
|