Edit model card

7 bit/bpw quantization of airoboros 70b 1.4.1 (https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1), using exllama2.

It fits on ~72GB VRAM, at 4K ctx (67-68GB VRAM usage without CFG, 69GB VRAM usage with CFG), so there is headroom for higher ctx with GQA and FA2.

Downloads last month: 6

Inference Examples

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.