Initial GPTQ model commit
Browse files
README.md
CHANGED
@@ -19,9 +19,9 @@ license: other
|
|
19 |
|
20 |
# VMware's Open Llama 7B v2 Open Instruct GPTQ
|
21 |
|
22 |
-
These files are GPTQ
|
23 |
|
24 |
-
|
25 |
|
26 |
## Repositories available
|
27 |
|
@@ -58,7 +58,11 @@ Each separate quant is in a different branch. See below for instructions on fet
|
|
58 |
## How to download from branches
|
59 |
|
60 |
- In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
|
61 |
-
- With Git, you can clone
|
|
|
|
|
|
|
|
|
62 |
|
63 |
## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
64 |
|
@@ -69,7 +73,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
|
|
69 |
1. Click the **Model tab**.
|
70 |
2. Under **Download custom model or LoRA**, enter `TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
|
71 |
- To download from a specific branch, enter for example `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
|
72 |
-
- see Provided Files above for the list of branches for each
|
73 |
3. Click **Download**.
|
74 |
4. The model will start downloading. Once it's finished it will say "Done"
|
75 |
5. In the top left, click the refresh icon next to **Model**.
|
@@ -99,13 +103,25 @@ use_triton = False
|
|
99 |
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
|
100 |
|
101 |
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
102 |
-
model_basename=model_basename
|
103 |
use_safetensors=True,
|
104 |
trust_remote_code=True,
|
105 |
device="cuda:0",
|
106 |
use_triton=use_triton,
|
107 |
quantize_config=None)
|
108 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
109 |
prompt = "Tell me about AI"
|
110 |
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
111 |
|
|
|
19 |
|
20 |
# VMware's Open Llama 7B v2 Open Instruct GPTQ
|
21 |
|
22 |
+
These files are GPTQ model files for [VMware's Open Llama 7B v2 Open Instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct).
|
23 |
|
24 |
+
Multipl GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
|
25 |
|
26 |
## Repositories available
|
27 |
|
|
|
58 |
## How to download from branches
|
59 |
|
60 |
- In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
|
61 |
+
- With Git, you can clone a branch with:
|
62 |
+
```
|
63 |
+
git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ`
|
64 |
+
```
|
65 |
+
- In Python Transformers code, the branch is the `revision` parameter; see below.
|
66 |
|
67 |
## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
68 |
|
|
|
73 |
1. Click the **Model tab**.
|
74 |
2. Under **Download custom model or LoRA**, enter `TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
|
75 |
- To download from a specific branch, enter for example `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
|
76 |
+
- see Provided Files above for the list of branches for each option.
|
77 |
3. Click **Download**.
|
78 |
4. The model will start downloading. Once it's finished it will say "Done"
|
79 |
5. In the top left, click the refresh icon next to **Model**.
|
|
|
103 |
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
|
104 |
|
105 |
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
106 |
+
model_basename=model_basename
|
107 |
use_safetensors=True,
|
108 |
trust_remote_code=True,
|
109 |
device="cuda:0",
|
110 |
use_triton=use_triton,
|
111 |
quantize_config=None)
|
112 |
|
113 |
+
"""
|
114 |
+
To download from a specific branch, use the revision parameter, as in this example:
|
115 |
+
|
116 |
+
model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
|
117 |
+
revision="gptq-4bit-32g-actorder_True",
|
118 |
+
model_basename=model_basename,
|
119 |
+
use_safetensors=True,
|
120 |
+
trust_remote_code=True,
|
121 |
+
device="cuda:0",
|
122 |
+
quantize_config=None)
|
123 |
+
"""
|
124 |
+
|
125 |
prompt = "Tell me about AI"
|
126 |
prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.
|
127 |
|