Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,16 @@
|
|
1 |
---
|
2 |
license: other
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: other
|
3 |
---
|
4 |
+
[GPlatty-30B](https://huggingface.co/lilloukas/GPlatty-30B) merged with bhenrym14's [airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-LoRA](https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-LoRA), quantized at 4 bit.
|
5 |
+
|
6 |
+
More info about the LoRA [Here](https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-LoRA). This is an alternative to SuperHOT 8k LoRA trained with LoRA_rank 64 and context extended to 16K, with airoboros 1.4.1 dataset.
|
7 |
+
|
8 |
+
It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.
|
9 |
+
|
10 |
+
I HIGHLY suggest to use exllama, to evade some VRAM issues.
|
11 |
+
|
12 |
+
Use compress_pos_emb = 8 for any context up to 16384 context.
|
13 |
+
|
14 |
+
If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 16384 context, use:
|
15 |
+
|
16 |
+
gpu_split: 8.4,9.6
|