TheBloke
/

open-llama-7B-v2-open-instruct-GPTQ

Text Generation

text-generation-inference

4-bit precision

Model card Files Files and versions Community

TheBloke commited on Jul 11, 2023

Commit

3e139f5

•

1 Parent(s): 6857214

Initial GPTQ model commit

Files changed (1) hide show

README.md +2 -1

README.md CHANGED Viewed

@@ -27,7 +27,7 @@ These models were quantised using hardware kindly provided by [Latitude.sh](http
 ## Repositories available
-* [GPTQ models for GPU inference, with multiple quantisation parameter options](https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ)
 * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GGML)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
@@ -39,6 +39,7 @@ Below is an instruction that describes a task. Write a response that appropriate
 ### Instruction: {prompt}
 ### Response:
 ```
 ## Provided files

 ## Repositories available
+* [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GPTQ)
 * [2, 3, 4, 5, 6 and 8-bit GGML models for CPU+GPU inference](https://huggingface.co/TheBloke/open-llama-7B-v2-open-instruct-GGML)
 * [Unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/VMware/open-llama-7b-v2-open-instruct)
 ### Instruction: {prompt}
 ### Response:
 ```
 ## Provided files