File size: 2,780 Bytes
f88579c
 
62d80fb
 
 
 
 
 
 
 
 
 
 
 
 
f88579c
62d80fb
0a94f11
62d80fb
 
 
 
 
 
 
9d1ea95
 
 
 
 
 
 
 
 
 
 
54fef78
2ccc3b3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
---
license: cc-by-sa-4.0
datasets:
- mc4
- cc100
- oscar
- togethercomputer/RedPajama-Data-1T
language:
- ja
- en
base_model: meta-llama/Llama-2-70b-hf
pipeline_tag: text-generation
tags:
- llama
- llama-2
---

This repository contains the exl2 quants of [karakuri-ai/karakuri-lm-70b-v0.1](https://huggingface.co/karakuri-ai/karakuri-lm-70b-v0.1), calibrated using [the default dataset built by exllamav2](https://github.com/turboderp/exllamav2/blob/3b0f5230e9619fc0ddf1f302785049d10a65fe26/conversion/tokenize.py#L55).

Compatible with [exllamav2](https://github.com/turboderp/exllamav2) version 0.0.11 and later. For optimal model loading, it is recommended to use [tabbyAPI](https://github.com/theroyallab/tabbyAPI).

The measurement file is attached in the branch `main` and all quants are stored in their respective branches.

The chart below presents the calibration perplexity and [wikitext-103-v1](https://huggingface.co/datasets/wikitext) test perplexity for all provided quants.

| Quant                                                                                   | Calibration Perplexity | wikitext-103-v1 Test Perplexity |
|-----------------------------------------------------------------------------------------|------------------------|---------------------------------|
| [2.4bpw-h6](https://huggingface.co/MatrixC7/karakuri-lm-70b-v0.1-exl2/tree/2.4bpw-h6)   | 8.4726                 | 7.1337                          |
| ~2.65bpw-h6~*                                                                           | ~8.1743~               | ~8.7475~                        |
| [2.65bpw-h6](https://huggingface.co/MatrixC7/karakuri-lm-70b-v0.1-exl2/tree/2.65bpw-h6) | 8.0901                 | 6.5724                          |
| [3.0bpw-h6](https://huggingface.co/MatrixC7/karakuri-lm-70b-v0.1-exl2/tree/3.0bpw-h6)   | 7.9927                 | 6.4607                          |
| [4.0bpw-h6](https://huggingface.co/MatrixC7/karakuri-lm-70b-v0.1-exl2/tree/4.0bpw-h6)   | 7.6440                 | 5.8014                          |
| [4.65bpw-h6](https://huggingface.co/MatrixC7/karakuri-lm-70b-v0.1-exl2/tree/4.65bpw-h6) | 7.5872                 | 5.7112                          |
| [5.0bpw-h6](https://huggingface.co/MatrixC7/karakuri-lm-70b-v0.1-exl2/tree/5.0bpw-h6)   | 7.5745                 |                                 |
| [6.0bpw-h6](https://huggingface.co/MatrixC7/karakuri-lm-70b-v0.1-exl2/tree/6.0bpw-h6)   | 7.5616                 |                                 |
| [8.0bpw-h8](https://huggingface.co/MatrixC7/karakuri-lm-70b-v0.1-exl2/tree/8.0bpw-h8)   | 7.5604                 |                                 |

*: This first 2.65bpw-h6 quantized version has been deprecated due to unexpectedly high wikitext-103-v1 Test Perplexity values.