Initial GPTQ model commit
Browse files
README.md
CHANGED
@@ -27,7 +27,7 @@ These models were quantised using hardware kindly provided by [Latitude.sh](http
|
|
27 |
|
28 |
## Repositories available
|
29 |
|
30 |
-
* [GPTQ models for GPU inference, with multiple quantisation parameter options](https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ)
|
31 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GGML)
|
32 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
|
33 |
|
@@ -39,6 +39,7 @@ Below is an instruction that describes a task. Write a response that appropriate
|
|
39 |
### Instruction: {prompt}
|
40 |
|
41 |
### Response:
|
|
|
42 |
```
|
43 |
|
44 |
## Provided files
|
|
|
27 |
|
28 |
## Repositories available
|
29 |
|
30 |
+
* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ)
|
31 |
* [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GGML)
|
32 |
* [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
|
33 |
|
|
|
39 |
### Instruction: {prompt}
|
40 |
|
41 |
### Response:
|
42 |
+
|
43 |
```
|
44 |
|
45 |
## Provided files
|