Add config and readme
Browse files- README.md +25 -0
- exl2mes-mythalion13b-4k-32.json +0 -0
README.md
ADDED
@@ -0,0 +1,25 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
language:
|
3 |
+
- en
|
4 |
+
---
|
5 |
+
Quantizations for [PygmalionAI/mythalion-13b](https://huggingface.co/PygmalionAI/mythalion-13b) in the [EXL2 format](https://github.com/turboderp/exllamav2)
|
6 |
+
|
7 |
+
Quant|VRAM estimate|Additional
|
8 |
+
---|---|---
|
9 |
+
[4k_hb8_b8](https://huggingface.co/Beinsezii/Mythalion-13b-EXL2/tree/4k_hb8_b8)|18GB|Recommended!
|
10 |
+
[4k_hb6_b6](https://huggingface.co/Beinsezii/Mythalion-13b-EXL2/tree/4k_hb6_b6)|15GB|
|
11 |
+
[4k_hb6_b5](https://huggingface.co/Beinsezii/Mythalion-13b-EXL2/tree/4k_hb6_b5)|13GB|Should fit in 12GB cards with 2k context
|
12 |
+
|
13 |
+
|
14 |
+
Breaking down the names:
|
15 |
+
- **4k** is calibrated with 4096 context @ 82 rows (maximum for wikitext) as opposed to the default 2048 context @ 100 rows.
|
16 |
+
- **hb8** is a header depth of 8 bits
|
17 |
+
- **b8** is a model weight average of 8.0 bits
|
18 |
+
|
19 |
+
All quantizations were calibrated with [wikitext-2](https://huggingface.co/datasets/wikitext/viewer/wikitext-2-v1/test)
|
20 |
+
|
21 |
+
You can run a model calibrated at 2k with a 4k context or vice versa. The actual difference between 2k and 4k calibrations appears to be very small.
|
22 |
+
|
23 |
+
VRAM estimates are performed with an extremely long chatlog in [oobabooga webui](https://github.com/oobabooga/text-generation-webui) on a 7900 XTX using [nvtop](https://github.com/Syllo/nvtop) to monitor **pytorch usage only**, rounded up. Systems with lots of extra background processes may use more. Additionally, NVIDIA based systems with [flash attention 2](https://github.com/Dao-AILab/flash-attention) **will use less VRAM** than otherwise estimated.
|
24 |
+
|
25 |
+
The measurement files are provided in the main branch so you can [make your own quants](https://github.com/turboderp/exllamav2/blob/master/doc/convert.md) at other bit depths without going through the 2-3 hours of measuring.
|
exl2mes-mythalion13b-4k-32.json
ADDED
The diff for this file is too large to render.
See raw diff
|
|