Add README.md
Browse files
README.md
ADDED
@@ -0,0 +1,67 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
|
3 |
+
inference: false
|
4 |
+
license: llama2
|
5 |
+
model_creator: https://huggingface.co/Phind
|
6 |
+
model_name: Phind-Codellama-34B-v2
|
7 |
+
model_type: llama
|
8 |
+
quantized_by: latimar
|
9 |
+
---
|
10 |
+
|
11 |
+
# Phind-CodeLlama-34B-v2 EXL2
|
12 |
+
|
13 |
+
Weights of [Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) converted
|
14 |
+
to [EXL2](https://github.com/turboderp/exllamav2#exl2-quantization) format.
|
15 |
+
|
16 |
+
Converted with the ExllamaV2 [convert.py](https://github.com/turboderp/exllamav2/blob/master/convert.py) script,
|
17 |
+
exllamav2 [commit](https://github.com/turboderp/exllamav2/commit/31f31e1b08eeccf4a5ab31fd202ef3100dce8d22)
|
18 |
+
|
19 |
+
|
20 |
+
| BPW (hb=8) | Human-Eval | Evol-Ins PPL | Wiki PPL | File Size (Gb) |
|
21 |
+
| ----------- | ----------- | ------------ | ---------- | -------------- |
|
22 |
+
| 2.55 | 0.402439 | 2.0944 | 18.9843 | 10.62 |
|
23 |
+
| 3.0 | 0.664634 | 2.0600 | 11.2096 | 12.36 |
|
24 |
+
| 4.625 | 0.701219 | 2.0401 | 6.7243 | 18.63 |
|
25 |
+
| 5.0 | 0.670731 | 2.0391 | 6.6956 | 20.09 |
|
26 |
+
|
27 |
+
## Datasets used for calibration and PPL measurement
|
28 |
+
|
29 |
+
* [Calibration](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k)
|
30 |
+
* [Wiki](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation/0000.parquet)
|
31 |
+
* [Evol-Ins](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1/blob/refs%2Fconvert%2Fparquet/default/train/0000.parquet)
|
32 |
+
|
33 |
+
|
34 |
+
### Conversion
|
35 |
+
|
36 |
+
Conversion arguments:
|
37 |
+
|
38 |
+
```
|
39 |
+
convert.py -i ${MODEL_DIR_FP16} -o ${WIP_DIR} -cf ${MODEL_DIR_EXL} -c ${CALIBRATION_DATASET} -r 200 -mr 32 -l 4096 -ml 4096 -hb 8 -b ${BPW}
|
40 |
+
```
|
41 |
+
|
42 |
+
`2.55` quant was converted using even more raws: `-r 400 -mr 64`
|
43 |
+
|
44 |
+
### Perplexity
|
45 |
+
|
46 |
+
Perplexity was measured with the [test_inference.py](https://github.com/turboderp/exllamav2/blob/master/test_inference.py) script:
|
47 |
+
```
|
48 |
+
test_inference.py -m ${MODEL_DIR_EXL} -ed ${PPL_DATASET}
|
49 |
+
```
|
50 |
+
|
51 |
+
### Human-Eval
|
52 |
+
|
53 |
+
For the point of reference, Phind says that the original model achieves **73.8** Human-Eval score.
|
54 |
+
|
55 |
+
Unfortunately, FP16/INT8 weights of this model won't fit on my RTX 4090, but FP16 quantized to NF4 fits,
|
56 |
+
so I generated samples with [this](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/tf.human-eval.py) script:
|
57 |
+
```
|
58 |
+
python tf.human-eval.py -m ${MODEL_DIR_FP16} -o nf4-samples.jsonl
|
59 |
+
```
|
60 |
+
|
61 |
+
NF4 variant gives us **0.70731707**
|
62 |
+
|
63 |
+
Samples for the Human-Eval scores of EXL2 quants were generated with [this](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/exl2.human-eval.py)
|
64 |
+
script like this:
|
65 |
+
```
|
66 |
+
python exl2.human-eval.py -m ${MODEL_DIR_EXL2} -c 4096 ${BPW}-samples.jsonl
|
67 |
+
```
|