7 bit/bpw quantization of airoboros 70b 1.4.1 (https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1), using exllama2.
It fits on ~72GB VRAM, at 4K ctx (67-68GB VRAM usage without CFG, 69GB VRAM usage with CFG), so there is headroom for higher ctx with GQA and FA2.
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.