File size: 347 Bytes
0740466
 
 
3195f07
5231bcb
64d454b
 
 
 
 
1
2
3
4
5
6
7
8
9
10
---
license: other
---
5 bit quantization of airoboros 70b 1.4.1 (https://huggingface.co/jondurbin/airoboros-l2-70b-gpt4-1.4.1), using exllama2.

On 2x4090, 3072 ctx seems to work fine with 21.5,22.5 gpu_split and max_attention_size = 1024 ** 2 instead if 2048 ** 2.

4096 may be factible on a single 48GB VRAM GPU (like A6000)

Tests are welcome.