guanaco-33b merged with bhenrym14's airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-LoRA, quantized at 4 bit.

More info about the LoRA Here. This is an alternative to SuperHOT 8k LoRA trained with LoRA_rank 64 and context extended to 16K, with airoboros 1.4.1 dataset.

It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.

I HIGHLY suggest to use exllama, to evade some VRAM issues.

Use compress_pos_emb = 8 for any context up to 16384 context.

If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 16384 context, use:

gpu_split: 8.8,9.2

