Description
Exllama 2 quant of NeverSleep/Nethena-20B
3 BPW, Head bit set to 8
Prompt template: Alpaca
Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{prompt}
### Response:
VRAM
My VRAM usage with 20B models are:
Bits per weight | Context | VRAM |
---|---|---|
6bpw | 4k | 24gb |
4bpw | 4k | 18gb |
4bpw | 8k | 24gb |
3bpw | 4k | 16gb |
3bpw | 8k | 21gb |
I have rounded up, these arent exact numbers, this is also on a windows machine. |
- Downloads last month
- 12
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.