2.5 bit quantization of airoboros 70b 1.4.1 (https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1), using exllama2.

Updated as 21 of September 2023, which should fix the bad ppl results.

I suggest, if using Ubuntu, to use it with flash-attn. It reduces VRAM usage by a good margin, and is specially useful for this case (70B model on a single 24GB VRAM GPU)

Downloads last month: 22

Inference API

Text Generation

This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.