Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,14 @@
|
|
1 |
---
|
2 |
license: other
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: other
|
3 |
---
|
4 |
+
[WizardLM-Uncensored-SuperCOT-StoryTelling-30b](https://huggingface.co/Monero/WizardLM-Uncensored-SuperCOT-StoryTelling-30b) merged with kaiokendev's [33b SuperHOT 8k LoRA](https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test), quantized at 4 bit.
|
5 |
+
|
6 |
+
It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.
|
7 |
+
|
8 |
+
I HIGHLY suggest to use exllama, to evade some VRAM issues.
|
9 |
+
|
10 |
+
Use (max_seq_len = context):
|
11 |
+
|
12 |
+
If max_seq_len = 4096, compress_pos_emb = 2
|
13 |
+
|
14 |
+
If max_seq_len = 8192, compress_pos_emb = 4
|