TheBloke commited on
Commit
072327d
1 Parent(s): b9c9b41

Initial GPTQ model commit

Browse files
Files changed (1) hide show
  1. README.md +21 -5
README.md CHANGED
@@ -19,9 +19,9 @@ license: other
19
 
20
  # VMware's Open Llama 7B v2 Open Instruct GPTQ
21
 
22
- These files are GPTQ 4bit model files for [VMware's Open Llama 7B v2 Open Instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct).
23
 
24
- It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
25
 
26
  ## Repositories available
27
 
@@ -58,7 +58,11 @@ Each separate quant is in a different branch. See below for instructions on fet
58
  ## How to download from branches
59
 
60
  - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
61
- - With Git, you can clone with: `git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
 
 
 
 
62
 
63
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
64
 
@@ -69,7 +73,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
69
  1. Click the **Model tab**.
70
  2. Under **Download custom model or LoRA**, enter `TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
71
  - To download from a specific branch, enter for example `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
72
- - see Provided Files above for the list of branches for each file type.
73
  3. Click **Download**.
74
  4. The model will start downloading. Once it's finished it will say "Done"
75
  5. In the top left, click the refresh icon next to **Model**.
@@ -99,13 +103,25 @@ use_triton = False
99
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
100
 
101
  model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
102
- model_basename=model_basename,
103
  use_safetensors=True,
104
  trust_remote_code=True,
105
  device="cuda:0",
106
  use_triton=use_triton,
107
  quantize_config=None)
108
 
 
 
 
 
 
 
 
 
 
 
 
 
109
  prompt = "Tell me about AI"
110
  prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.
111
 
 
19
 
20
  # VMware's Open Llama 7B v2 Open Instruct GPTQ
21
 
22
+ These files are GPTQ model files for [VMware's Open Llama 7B v2 Open Instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct).
23
 
24
+ Multipl GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
25
 
26
  ## Repositories available
27
 
 
58
  ## How to download from branches
59
 
60
  - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
61
+ - With Git, you can clone a branch with:
62
+ ```
63
+ git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ`
64
+ ```
65
+ - In Python Transformers code, the branch is the `revision` parameter; see below.
66
 
67
  ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
68
 
 
73
  1. Click the **Model tab**.
74
  2. Under **Download custom model or LoRA**, enter `TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
75
  - To download from a specific branch, enter for example `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
76
+ - see Provided Files above for the list of branches for each option.
77
  3. Click **Download**.
78
  4. The model will start downloading. Once it's finished it will say "Done"
79
  5. In the top left, click the refresh icon next to **Model**.
 
103
  tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
104
 
105
  model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
106
+ model_basename=model_basename
107
  use_safetensors=True,
108
  trust_remote_code=True,
109
  device="cuda:0",
110
  use_triton=use_triton,
111
  quantize_config=None)
112
 
113
+ """
114
+ To download from a specific branch, use the revision parameter, as in this example:
115
+
116
+ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
117
+ revision="gptq-4bit-32g-actorder_True",
118
+ model_basename=model_basename,
119
+ use_safetensors=True,
120
+ trust_remote_code=True,
121
+ device="cuda:0",
122
+ quantize_config=None)
123
+ """
124
+
125
  prompt = "Tell me about AI"
126
  prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.
127