latimar
/

Phind-Codellama-34B-v2-megacode-exl2

Text Generation

text-generation-inference

Model card Files Files and versions Community

latimar commited on Oct 19, 2023

Commit

c2ff054

•

1 Parent(s): dde07d1

Add README.md

Files changed (1) hide show

README.md +67 -0

README.md ADDED Viewed

	@@ -0,0 +1,67 @@

+---
+base_model: https://huggingface.co/Phind/Phind-CodeLlama-34B-v2
+inference: false
+license: llama2
+model_creator: https://huggingface.co/Phind
+model_name: Phind-Codellama-34B-v2
+model_type: llama
+quantized_by: latimar
+---
+# Phind-CodeLlama-34B-v2 EXL2
+Weights of [Phind-CodeLlama-34B-v2](https://huggingface.co/Phind/Phind-CodeLlama-34B-v2) converted
+to [EXL2](https://github.com/turboderp/exllamav2#exl2-quantization) format.
+Converted with the ExllamaV2 [convert.py](https://github.com/turboderp/exllamav2/blob/master/convert.py) script,
+exllamav2 [commit](https://github.com/turboderp/exllamav2/commit/31f31e1b08eeccf4a5ab31fd202ef3100dce8d22)
+| BPW (hb=8)  | Human-Eval  | Evol-Ins PPL | Wiki PPL   | File Size (Gb) |
+| ----------- | ----------- | ------------ | ---------- | -------------- |
+|  2.55       | 0.402439    | 2.0944       | 18.9843    |   10.62        |
+|  3.0        | 0.664634    | 2.0600       | 11.2096    |   12.36        |
+|  4.625      | 0.701219    | 2.0401       | 6.7243     |   18.63        |
+|  5.0        | 0.670731    | 2.0391       | 6.6956     |   20.09        |
+## Datasets used for calibration and PPL measurement
+* [Calibration](https://huggingface.co/datasets/rombodawg/2XUNCENSORED_MegaCodeTraining188k)
+* [Wiki](https://huggingface.co/datasets/wikitext/blob/refs%2Fconvert%2Fparquet/wikitext-2-v1/validation/0000.parquet)
+* [Evol-Ins](https://huggingface.co/datasets/nickrosh/Evol-Instruct-Code-80k-v1/blob/refs%2Fconvert%2Fparquet/default/train/0000.parquet)
+### Conversion
+Conversion arguments:
+```
+convert.py -i ${MODEL_DIR_FP16} -o ${WIP_DIR} -cf ${MODEL_DIR_EXL} -c ${CALIBRATION_DATASET} -r 200 -mr 32 -l 4096 -ml 4096 -hb 8 -b ${BPW}
+```
+`2.55` quant was converted using even more raws: `-r 400 -mr 64`
+### Perplexity
+Perplexity was measured with the [test_inference.py](https://github.com/turboderp/exllamav2/blob/master/test_inference.py) script:
+```
+test_inference.py -m ${MODEL_DIR_EXL} -ed ${PPL_DATASET}
+```
+### Human-Eval
+For the point of reference, Phind says that the original model achieves **73.8** Human-Eval score.
+Unfortunately, FP16/INT8 weights of this model won't fit on my RTX 4090, but FP16 quantized to NF4 fits,
+so I generated samples with [this](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/tf.human-eval.py) script:
+```
+python tf.human-eval.py -m ${MODEL_DIR_FP16} -o nf4-samples.jsonl
+```
+NF4 variant gives us **0.70731707**
+Samples for the Human-Eval scores of EXL2 quants were generated with [this](https://github.com/epicfilemcnulty/llm-tools/blob/main/eval/exl2.human-eval.py)
+script like this:
+```
+python exl2.human-eval.py -m ${MODEL_DIR_EXL2} -c 4096 ${BPW}-samples.jsonl
+```