Panchovix
/

GPlatty-30B-lxctx-PI-16384-LoRA-4bit-32g

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Panchovix commited on Jul 15, 2023

Commit

a6998a1

•

1 Parent(s): f86143e

Update README.md

Files changed (1) hide show

README.md +13 -0

README.md CHANGED Viewed

@@ -1,3 +1,16 @@
 ---
 license: other
 ---

 ---
 license: other
 ---
+[GPlatty-30B](https://huggingface.co/lilloukas/GPlatty-30B) merged with bhenrym14's [airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-LoRA](https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-LoRA), quantized at 4 bit.
+More info about the LoRA [Here](https://huggingface.co/bhenrym14/airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-LoRA). This is an alternative to SuperHOT 8k LoRA trained with LoRA_rank 64 and context extended to 16K, with airoboros 1.4.1 dataset.
+It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.
+I HIGHLY suggest to use exllama, to evade some VRAM issues.
+Use compress_pos_emb = 8 for any context up to 16384 context.
+If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 16384 context, use:
+gpu_split: 8.4,9.6