sardukar commited on
Commit
b4f84b3
1 Parent(s): 091fd08

Updated quantization for llama13b-4bit

Browse files

GPTQ implementation - https://github.com/qwopqwop200/GPTQ-for-LLaMa/tree/49efe0b67db4b40eac2ae963819ebc055da64074
Conversion process:
`CUDA_VISIBLE_DEVICES=0 python llama.py ./llama-13b c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors ./q4/llama13b-4bit-ts-ao-g128-v2.safetensors`

llama13b-4bit-ts-ao-g128-v2.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9028da35db0525014be99eb26a5afba6e006daaa9135eaa0e857453d4a299eee
3
+ size 7255159218