Update README.md
Browse files
README.md
CHANGED
@@ -31,12 +31,15 @@ Please read carefully below to see how to use it.
|
|
31 |
|
32 |
**NOTE**: Using the full 8K context will exceed 24GB VRAM.
|
33 |
|
|
|
|
|
34 |
## Repositories available
|
35 |
|
36 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-33B-V1.0-Uncensored-SuperHOT-8KGPTQ)
|
37 |
-
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/none)
|
38 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8k)
|
39 |
|
|
|
|
|
40 |
## How to easily download and use this model in text-generation-webui
|
41 |
|
42 |
Please make sure you're using the latest version of text-generation-webui
|
|
|
31 |
|
32 |
**NOTE**: Using the full 8K context will exceed 24GB VRAM.
|
33 |
|
34 |
+
GGML versions are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.
|
35 |
+
|
36 |
## Repositories available
|
37 |
|
38 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/WizardLM-33B-V1.0-Uncensored-SuperHOT-8KGPTQ)
|
|
|
39 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/Panchovix/WizardLM-33B-V1.0-Uncensored-SuperHOT-8k)
|
40 |
|
41 |
+
GGML quants are not yet provided, as there is not yet support for SuperHOT in llama.cpp. This is being investigated and will hopefully come soon.
|
42 |
+
|
43 |
## How to easily download and use this model in text-generation-webui
|
44 |
|
45 |
Please make sure you're using the latest version of text-generation-webui
|