TheBloke commited on
Commit
cac9c5d
1 Parent(s): b52b4fa

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -7
README.md CHANGED
@@ -1,6 +1,13 @@
1
  ---
2
  inference: false
3
- license: other
 
 
 
 
 
 
 
4
  ---
5
 
6
  <!-- header start -->
@@ -26,7 +33,7 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
26
  ## Repositories available
27
 
28
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.0-GPTQ)
29
- * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.0-GGML)
30
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/LoupGarou/WizardCoder-Guanaco-15B-V1.0)
31
 
32
  ## Prompt template: Alpaca
@@ -53,9 +60,10 @@ It is strongly recommended to use the text-generation-webui one-click-installers
53
  5. In the top left, click the refresh icon next to **Model**.
54
  6. In the **Model** dropdown, choose the model you just downloaded: `WizardCoder-Guanaco-15B-V1.0-GPTQ`
55
  7. The model will automatically load, and is now ready for use!
56
- 8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
 
57
  * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
58
- 9. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
59
 
60
  ## How to use this GPTQ model from Python code
61
 
@@ -127,14 +135,14 @@ print(pipe(prompt_template)[0]['generated_text'])
127
 
128
  This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
129
 
130
- If a Llama model, it will also be supported by ExLlama, which will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
131
 
132
  It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
133
 
134
  * `wizardcoder-guanaco-15b-v1.0-GPTQ-4bit-128g.no-act.order.safetensors`
135
  * Works with AutoGPTQ in CUDA or Triton modes.
136
- * [ExLlama](https://github.com/turboderp/exllama) suupports Llama 4-bit GPTQs, and will provide 2x speedup over AutoGPTQ and GPTQ-for-LLaMa.
137
- * Works with GPTQ-for-LLaMa in CUDA mode. May have issues with GPTQ-for-LLaMa Triton mode.
138
  * Works with text-generation-webui, including one-click-installers.
139
  * Parameters: Groupsize = 128. Act Order / desc_act = False.
140
 
 
1
  ---
2
  inference: false
3
+ language:
4
+ - en
5
+ datasets:
6
+ - guanaco
7
+ model_hub_library:
8
+ - transformers
9
+ license:
10
+ - apache-2.0
11
  ---
12
 
13
  <!-- header start -->
 
33
  ## Repositories available
34
 
35
  * [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.0-GPTQ)
36
+ * [4, 5, and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.0-GGML)
37
  * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/LoupGarou/WizardCoder-Guanaco-15B-V1.0)
38
 
39
  ## Prompt template: Alpaca
 
60
  5. In the top left, click the refresh icon next to **Model**.
61
  6. In the **Model** dropdown, choose the model you just downloaded: `WizardCoder-Guanaco-15B-V1.0-GPTQ`
62
  7. The model will automatically load, and is now ready for use!
63
+ 8. If you have problems, make sure that **Loader** is set to **AutoGPTQ**.
64
+ 9. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
65
  * Note that you do not need to set GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
66
+ 10. Once you're ready, click the **Text Generation tab** and enter a prompt to get started!
67
 
68
  ## How to use this GPTQ model from Python code
69
 
 
135
 
136
  This will work with AutoGPTQ and CUDA versions of GPTQ-for-LLaMa. There are reports of issues with Triton mode of recent GPTQ-for-LLaMa. If you have issues, please use AutoGPTQ instead.
137
 
138
+ As this is not a Llama model, it will not be supported by ExLlama.
139
 
140
  It was created with group_size 128 to increase inference accuracy, but without --act-order (desc_act) to increase compatibility and improve inference speed.
141
 
142
  * `wizardcoder-guanaco-15b-v1.0-GPTQ-4bit-128g.no-act.order.safetensors`
143
  * Works with AutoGPTQ in CUDA or Triton modes.
144
+ * Does NOT work with [ExLlama](https://github.com/turboderp/exllama).
145
+ * Untested with GPTQ-for-LLaMa.
146
  * Works with text-generation-webui, including one-click-installers.
147
  * Parameters: Groupsize = 128. Act Order / desc_act = False.
148