Text Generation
Transformers
Safetensors
English
Chinese
llama
llama2
text-generation-inference
4-bit precision
gptq
TheBloke commited on
Commit
dec344e
1 Parent(s): 6f63ca1

Upload README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -0
README.md CHANGED
@@ -82,6 +82,7 @@ Multiple GPTQ parameter permutations are provided; see Provided Files below for
82
 
83
  * [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/CausalLM-14B-AWQ)
84
  * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/CausalLM-14B-GPTQ)
 
85
  * [CausalLM's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/CausalLM/14B)
86
  <!-- repositories-available end -->
87
 
 
82
 
83
  * [AWQ model(s) for GPU inference.](https://huggingface.co/TheBloke/CausalLM-14B-AWQ)
84
  * [GPTQ models for GPU inference, with multiple quantisation parameter options.](https://huggingface.co/TheBloke/CausalLM-14B-GPTQ)
85
+ * [2, 3, 4, 5, 6 and 8-bit GGUF models for CPU+GPU inference](https://huggingface.co/TheBloke/CausalLM-14B-GGUF)
86
  * [CausalLM's original unquantised fp16 model in pytorch format, for GPU inference and for further conversions](https://huggingface.co/CausalLM/14B)
87
  <!-- repositories-available end -->
88