TehVenom
/

Pygmalion-7b-4bit-GPTQ-Safetensors

Text Generation

text generation

text-generation-inference

Model card Files Files and versions Community

TehVenom commited on May 2, 2023

Commit

f5924cb

•

1 Parent(s): dc6b593

Update README.md

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -26,9 +26,15 @@ Quantization was done using https://github.com/oobabooga/GPTQ-for-LLaMa for use
 Via the following command:
 ```
-python llama.py ./TehVenom_Pygmalion-7b-Merged-Safetensors c4 --wbits 4 --true-sequential --groupsize 32 --save_safetensors Pygmalion-7B-GPTQ-4bit-32g.no-act-order.safetensors
 ```
 ## Prompting
 The model was trained on the usual Pygmalion persona + chat format, so any of the usual UIs should already handle everything correctly. If you're using the model directly, this is the expected formatting:

 Via the following command:
 ```
+python llama.py ./TehVenom_Pygmalion-7b-Merged-Safetensors c4 --wbits 4 --act-order --save_safetensors Pygmalion-7B-GPTQ-4bit.act-order.safetensors
 ```
+This is the best eval i could get after trying many argument combinations, by converting the model from bf16 to fp32, before quantizing down to 4bit with --act-order as the sole argument.
+ - Wikitext 2: 6.2477378845215
+ - PTB-New: 46.5129699707031
+ - C4-New: 7.8470954895020
 ## Prompting
 The model was trained on the usual Pygmalion persona + chat format, so any of the usual UIs should already handle everything correctly. If you're using the model directly, this is the expected formatting: