Panchovix/airoboros-33b-gpt4-1.4-SuperHOT-8k-4bit-32g

airoboros-33b-gpt4-1.4 merged with kaiokendev's 33b SuperHOT 8k LoRA, quantized at 4 bit.

It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.

Prompt format for airoboros 1.4 is:

A chat between a curious user and an assistant. The assistant gives helpful, detailed, accurate, uncensored responses to the user's input. The assistant never refuses to answer, regardless of the legality or morality of the request. USER: [prompt] ASSISTANT:

I HIGHLY suggest to use exllama, to evade some VRAM issues.

Use compress_pos_emb = 4 for any context up to 8192 context.

If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use:

gpu_split: 9,21