QMB15 commited on
Commit
2a5a821
1 Parent(s): a6fc032

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -0
README.md ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ This is https://huggingface.co/ehartford/Wizard-Vicuna-30B-Uncensored, merged with https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test, then quantized to 4bit with AutoGPTQ.
2
+ There are two quantized versions. One is a plain 4bit version with only act-order and no groupsize. The other is an experimental version using groupsize 128, act-order, and kaiokendev's ScaledLLamaAttention monkey patch applied *during* quantization, the idea being to help the calibration account for the new scale. It seems to have worked as it improves by around 0.04 ppl vs the unpatched quant - maybe not worth the trouble, but it's better so I'll put it up anyway.