TheBloke
/

open-llama-7B-v2-open-instruct-GPTQ

@@ -19,9 +19,9 @@ license: other
 # VMware's Open Llama 7B v2 Open Instruct GPTQ
-These files are GPTQ 4bit model files for [VMware's Open Llama 7B v2 Open Instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct).
-It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa).
 ## Repositories available
@@ -58,7 +58,11 @@ Each separate quant is in a different branch.  See below for instructions on fet
 ## How to download from branches
 - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
-- With Git, you can clone with: `git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
 ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
@@ -69,7 +73,7 @@ It is strongly recommended to use the text-generation-webui one-click-installers
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
   - To download from a specific branch, enter for example `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
-  - see Provided Files above for the list of branches for each file type.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done"
 5. In the top left, click the refresh icon next to **Model**.
@@ -99,13 +103,25 @@ use_triton = False
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
-        model_basename=model_basename,
         use_safetensors=True,
         trust_remote_code=True,
         device="cuda:0",
         use_triton=use_triton,
         quantize_config=None)
 prompt = "Tell me about AI"
 prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.

 # VMware's Open Llama 7B v2 Open Instruct GPTQ
+These files are GPTQ model files for [VMware's Open Llama 7B v2 Open Instruct](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct).
+Multipl GPTQ parameter permutations are provided; see Provided Files below for details of the options provided, their parameters, and the software used to create them.
 ## Repositories available
 ## How to download from branches
 - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
+- With Git, you can clone a branch with:
+```
+git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ`
+```
+- In Python Transformers code, the branch is the `revision` parameter; see below.
 ## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
 1. Click the **Model tab**.
 2. Under **Download custom model or LoRA**, enter `TheBloke/open-llama-7B-v2-open-instruct-GPTQ`.
   - To download from a specific branch, enter for example `TheBloke/open-llama-7B-v2-open-instruct-GPTQ:gptq-4bit-32g-actorder_True`
+  - see Provided Files above for the list of branches for each option.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done"
 5. In the top left, click the refresh icon next to **Model**.
 tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)
 model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
+        model_basename=model_basename
         use_safetensors=True,
         trust_remote_code=True,
         device="cuda:0",
         use_triton=use_triton,
         quantize_config=None)
+"""
+To download from a specific branch, use the revision parameter, as in this example:
+model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
+        revision="gptq-4bit-32g-actorder_True",
+        model_basename=model_basename,
+        use_safetensors=True,
+        trust_remote_code=True,
+        device="cuda:0",
+        quantize_config=None)
+"""
 prompt = "Tell me about AI"
 prompt_template=f'''Below is an instruction that describes a task. Write a response that appropriately completes the request.