TheBloke
/

StableBeluga2-70B-GPTQ

@@ -40,8 +40,8 @@ None
 ## Repositories available
-* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/StableBeluga2-GPTQ)
-* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/StableBeluga2-GGML)
 * [Stability AI's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/stabilityai/StableBeluga2)
 ## Prompt template: Orca-Hashes
@@ -79,7 +79,7 @@ Each separate quant is in a different branch.  See below for instructions on fet
 - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/StableBeluga2-GPTQ:gptq-4bit-32g-actorder_True`
 - With Git, you can clone a branch with:
 ```
-git clone --branch gptq-4bit-32g-actorder_True https://huggingface.co/TheBloke/StableBeluga2-GPTQ`
 ```
 - In Python Transformers code, the branch is the `revision` parameter; see below.
@@ -90,13 +90,13 @@ Please make sure you're using the latest version of [text-generation-webui](http
 It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
 1. Click the **Model tab**.
-2. Under **Download custom model or LoRA**, enter `TheBloke/StableBeluga2-GPTQ`.
-  - To download from a specific branch, enter for example `TheBloke/StableBeluga2-GPTQ:gptq-4bit-32g-actorder_True`
   - see Provided Files above for the list of branches for each option.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done"
 5. In the top left, click the refresh icon next to **Model**.
-6. In the **Model** dropdown, choose the model you just downloaded: `StableBeluga2-GPTQ`
 7. The model will automatically load, and is now ready for use!
 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
   * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
@@ -114,7 +114,7 @@ Then try the following example code:
 from transformers import AutoTokenizer, pipeline, logging
 from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
-model_name_or_path = "TheBloke/StableBeluga2-GPTQ"
 model_basename = "gptq_model-4bit--1g"
 use_triton = False

 ## Repositories available
+* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/StableBeluga2-70B-GPTQ)
+* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/StableBeluga2-70B-GGML)
 * [Stability AI's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/stabilityai/StableBeluga2)
 ## Prompt template: Orca-Hashes
 - In text-generation-webui, you can add `:branch` to the end of the download name, eg `TheBloke/StableBeluga2-GPTQ:gptq-4bit-32g-actorder_True`
 - With Git, you can clone a branch with:
 ```
+git clone --branch gptq-4bit-32g-actorder_True --single-branch https://huggingface.co/TheBloke/StableBeluga2-GPTQ
 ```
 - In Python Transformers code, the branch is the `revision` parameter; see below.
 It is strongly recommended to use the text-generation-webui one-click-installers unless you know how to make a manual install.
 1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter `TheBloke/StableBeluga2-70B-GPTQ`.
+  - To download from a specific branch, enter for example `TheBloke/StableBeluga2-70B-GPTQ:gptq-4bit-32g-actorder_True`
   - see Provided Files above for the list of branches for each option.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done"
 5. In the top left, click the refresh icon next to **Model**.
+6. In the **Model** dropdown, choose the model you just downloaded: `StableBeluga2-70B-GPTQ`
 7. The model will automatically load, and is now ready for use!
 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
   * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
 from transformers import AutoTokenizer, pipeline, logging
 from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig
+model_name_or_path = "TheBloke/StableBeluga2-70B-GPTQ"
 model_basename = "gptq_model-4bit--1g"
 use_triton = False