Upload README.md
Browse files
README.md
CHANGED
|
@@ -26,6 +26,7 @@ prompt_template: '<#meta#>
|
|
| 26 |
'
|
| 27 |
quantized_by: TheBloke
|
| 28 |
---
|
|
|
|
| 29 |
|
| 30 |
<!-- header start -->
|
| 31 |
<!-- 200823 -->
|
|
@@ -185,7 +186,7 @@ Note that using Git with HF repos is strongly discouraged. It will be much slowe
|
|
| 185 |
|
| 186 |
<!-- README_GPTQ.md-download-from-branches end -->
|
| 187 |
<!-- README_GPTQ.md-text-generation-webui start -->
|
| 188 |
-
## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
|
| 189 |
|
| 190 |
Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
| 191 |
|
|
@@ -193,16 +194,20 @@ It is strongly recommended to use the text-generation-webui one-click-installers
|
|
| 193 |
|
| 194 |
1. Click the **Model tab**.
|
| 195 |
2. Under **Download custom model or LoRA**, enter `TheBloke/Inkbot-13B-8k-0.2-GPTQ`.
|
| 196 |
-
|
| 197 |
-
|
|
|
|
|
|
|
| 198 |
3. Click **Download**.
|
| 199 |
4. The model will start downloading. Once it's finished it will say "Done".
|
| 200 |
5. In the top left, click the refresh icon next to **Model**.
|
| 201 |
6. In the **Model** dropdown, choose the model you just downloaded: `Inkbot-13B-8k-0.2-GPTQ`
|
| 202 |
7. The model will automatically load, and is now ready for use!
|
| 203 |
8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
|
| 204 |
-
|
| 205 |
-
|
|
|
|
|
|
|
| 206 |
|
| 207 |
<!-- README_GPTQ.md-text-generation-webui end -->
|
| 208 |
|
|
@@ -214,7 +219,7 @@ It's recommended to use TGI version 1.1.0 or later. The official Docker containe
|
|
| 214 |
Example Docker parameters:
|
| 215 |
|
| 216 |
```shell
|
| 217 |
-
--model-id TheBloke/Inkbot-13B-8k-0.2-GPTQ --port 3000 --quantize
|
| 218 |
```
|
| 219 |
|
| 220 |
Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):
|
|
|
|
| 26 |
'
|
| 27 |
quantized_by: TheBloke
|
| 28 |
---
|
| 29 |
+
<!-- markdownlint-disable MD041 -->
|
| 30 |
|
| 31 |
<!-- header start -->
|
| 32 |
<!-- 200823 -->
|
|
|
|
| 186 |
|
| 187 |
<!-- README_GPTQ.md-download-from-branches end -->
|
| 188 |
<!-- README_GPTQ.md-text-generation-webui start -->
|
| 189 |
+
## How to easily download and use this model in [text-generation-webui](https://github.com/oobabooga/text-generation-webui)
|
| 190 |
|
| 191 |
Please make sure you're using the latest version of [text-generation-webui](https://github.com/oobabooga/text-generation-webui).
|
| 192 |
|
|
|
|
| 194 |
|
| 195 |
1. Click the **Model tab**.
|
| 196 |
2. Under **Download custom model or LoRA**, enter `TheBloke/Inkbot-13B-8k-0.2-GPTQ`.
|
| 197 |
+
|
| 198 |
+
- To download from a specific branch, enter for example `TheBloke/Inkbot-13B-8k-0.2-GPTQ:gptq-4bit-32g-actorder_True`
|
| 199 |
+
- see Provided Files above for the list of branches for each option.
|
| 200 |
+
|
| 201 |
3. Click **Download**.
|
| 202 |
4. The model will start downloading. Once it's finished it will say "Done".
|
| 203 |
5. In the top left, click the refresh icon next to **Model**.
|
| 204 |
6. In the **Model** dropdown, choose the model you just downloaded: `Inkbot-13B-8k-0.2-GPTQ`
|
| 205 |
7. The model will automatically load, and is now ready for use!
|
| 206 |
8. If you want any custom settings, set them and then click **Save settings for this model** followed by **Reload the Model** in the top right.
|
| 207 |
+
|
| 208 |
+
- Note that you do not need to and should not set manual GPTQ parameters any more. These are set automatically from the file `quantize_config.json`.
|
| 209 |
+
|
| 210 |
+
9. Once you're ready, click the **Text Generation** tab and enter a prompt to get started!
|
| 211 |
|
| 212 |
<!-- README_GPTQ.md-text-generation-webui end -->
|
| 213 |
|
|
|
|
| 219 |
Example Docker parameters:
|
| 220 |
|
| 221 |
```shell
|
| 222 |
+
--model-id TheBloke/Inkbot-13B-8k-0.2-GPTQ --port 3000 --quantize gptq --max-input-length 3696 --max-total-tokens 4096 --max-batch-prefill-tokens 4096
|
| 223 |
```
|
| 224 |
|
| 225 |
Example Python code for interfacing with TGI (requires huggingface-hub 0.17.0 or later):
|