Edit model card

h2ogpt-research-oig-oasst1-512-30b merged with kaiokendev's 33b SuperHOT 8k LoRA, quantized at 4 bit.

It was created with GPTQ-for-LLaMA with group size 32 and act order true as parameters, to get the maximum perplexity vs FP16 model.

I HIGHLY suggest to use exllama, to evade some VRAM issues.

Use compress_pos_emb = 4 for any context up to 8192 context.

If you have 2x24 GB VRAM GPUs cards, to not get Out of Memory errors at 8192 context, use:

gpu_split: 9,21

Downloads last month
1