Edit model card

guanaco-33b merged with bhenrym14's airoboros-33b-gpt4-1.4.1-lxctx-PI-16384-LoRA, quantized at 4 bit.

More info about the LoRA Here. This is an alternative to SuperHOT 8k LoRA trained with LoRA_rank 64 and context extended to 16K, with airoboros 1.4.1 dataset.

It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.

I HIGHLY suggest to use exllama, to evade some VRAM issues.

Use compress_pos_emb = 8 for any context up to 16384 context.

If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 16384 context, use:

gpu_split: 8.8,9.2

Downloads last month
10
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.