TheBloke
/

WizardLM-13B-V1-0-Uncensored-SuperHOT-8K-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Jun 26, 2023

Commit

c0f4f4d

·

1 Parent(s): 4cdd51e

Update README.md

Files changed (1) hide show

README.md +5 -5

README.md CHANGED Viewed

@@ -41,21 +41,21 @@ GGML versions are not yet provided, as there is not yet support for SuperHOT in
 ## Repositories available
-* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-13B-V1.0-Uncensored-SuperHOT-8K-GPTQ)
-* [Unquantised SuperHOT fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/WizardLM-13B-V1.0-Uncensored-SuperHOT-8K-fp16)
-* [Unquantised base fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/ehartford/WizardLM-13B-V1.0-Uncensored)
 ## How to easily download and use this model in text-generation-webui with ExLlama
 Please make sure you're using the latest version of text-generation-webui
 1. Click the **Model tab**.
-2. Under **Download custom model or LoRA**, enter `TheBloke/WizardLM-13B-V1.0-Uncensored-SuperHOT-8K-GPTQ`.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done"
 5. Untick **Autoload the model**
 6. In the top left, click the refresh icon next to **Model**.
-7. In the **Model** dropdown, choose the model you just downloaded: `WizardLM-13B-V1.0-Uncensored-SuperHOT-8K-GPTQ`
 8. To use the increased context, set the **Loader** to **ExLlama**, set **max_seq_len** to 8192 or 4096, and set **compress_pos_emb** to **4** for 8192 context, or to **2** for 4096 context.
 9. Now click **Save Settings** followed by **Reload**
 10. The model will automatically load, and is now ready for use!

 ## Repositories available
+* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-13B-V1-0-Uncensored-SuperHOT-8K-GPTQ)
+* [Unquantised SuperHOT fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/TheBloke/WizardLM-13B-V1-0-Uncensored-SuperHOT-8K-fp16)
+* [Unquantised base fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/ehartford/WizardLM-13B-V1-0-Uncensored)
 ## How to easily download and use this model in text-generation-webui with ExLlama
 Please make sure you're using the latest version of text-generation-webui
 1. Click the **Model tab**.
+2. Under **Download custom model or LoRA**, enter `TheBloke/WizardLM-13B-V1-0-Uncensored-SuperHOT-8K-GPTQ`.
 3. Click **Download**.
 4. The model will start downloading. Once it's finished it will say "Done"
 5. Untick **Autoload the model**
 6. In the top left, click the refresh icon next to **Model**.
+7. In the **Model** dropdown, choose the model you just downloaded: `WizardLM-13B-V1-0-Uncensored-SuperHOT-8K-GPTQ`
 8. To use the increased context, set the **Loader** to **ExLlama**, set **max_seq_len** to 8192 or 4096, and set **compress_pos_emb** to **4** for 8192 context, or to **2** for 4096 context.
 9. Now click **Save Settings** followed by **Reload**
 10. The model will automatically load, and is now ready for use!