|
--- |
|
inference: false |
|
language: |
|
- en |
|
pipeline_tag: text-generation |
|
tags: |
|
- Yi |
|
- llama |
|
- llama-2 |
|
license: other |
|
license_name: yi-license |
|
license_link: LICENSE |
|
datasets: |
|
- jondurbin/airoboros-2.2.1 |
|
- lemonilia/LimaRP |
|
--- |
|
# airoboros-2.2.1-limarpv3-y34b-exl2 |
|
|
|
Exllama v2 quant of [Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b](https://huggingface.co/Doctor-Shotgun/airoboros-2.2.1-limarpv3-y34b) |
|
|
|
Branches: |
|
- main: measurement.json calculated at 2048 token calibration rows on PIPPA |
|
- 4.65bpw-h6: 4.65 decoder bits per weight, 6 head bits |
|
- ideal for 24gb GPUs at 8k context (on my 24gb Windows setup with flash attention 2, peak VRAM usage during inference with exllamav2_hf was around 23.4gb with 0.9gb used at baseline) |
|
- 6.0bpw-h6: 6 decoder bits per weight, 6 head bits |
|
- ideal for large (>24gb) VRAM setups |