Update README.md
Browse files
README.md
CHANGED
@@ -30,9 +30,10 @@ It is the result of quantising to 4bit using [GPTQ-for-LLaMa](https://github.com
|
|
30 |
## Repositories available
|
31 |
|
32 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/openchat_v2-GPTQ)
|
33 |
-
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/openchat_v2-GGML)
|
34 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/openchat/openchat_v2)
|
35 |
|
|
|
|
|
36 |
## Prompt template: custom
|
37 |
|
38 |
This model uses a custom prompt template. This will likely mean it will NOT work in UIs like text-generation-webui until special support is added.
|
|
|
30 |
## Repositories available
|
31 |
|
32 |
* [4-bit GPTQ models for GPU inference](https://huggingface.co/TheBloke/openchat_v2-GPTQ)
|
|
|
33 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/openchat/openchat_v2)
|
34 |
|
35 |
+
GGML models have not been made due to the custom prompt templating required, which I believe can't work with GGML at this time.
|
36 |
+
|
37 |
## Prompt template: custom
|
38 |
|
39 |
This model uses a custom prompt template. This will likely mean it will NOT work in UIs like text-generation-webui until special support is added.
|