File size: 812 Bytes
12a9f17
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
a368527
 
12a9f17
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
---
inference: false
language:
- en
pipeline_tag: text-generation
tags:
- Yi
- llama
- llama-2
license: other
license_name: yi-license
license_link: LICENSE
datasets:
- jondurbin/airoboros-2.2.1
- lemonilia/LimaRP
---
# airoboros-2.2.1-limarpv3-y34b-exl2

Exllama v2 quant of [Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b](https://huggingface.co/Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b)

Branches:
- main: measurement.json calculated at 2048 token calibration rows on PIPPA
- 4.65bpw-h6: 4.65 decoder bits per weight, 6 head bits
  - ideal for 24gb GPUs at 8k context (on my 24gb Windows setup with flash attention 2, peak VRAM usage during inference with exllamav2_hf was around 23.4gb with 0.9gb used at baseline)
- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits
  - ideal for large (>24gb) VRAM setups