Edit model card

llama.cpp conversion of https://huggingface.co/nakodanei/Blue-Orchid-2x7b/

except for f16 and q8_0, every quant is using the merge.imatrix

merge.imatrix is a merge of kalomaze-group_10_merged.172chunks.imatrix and wiki.train.400chunks.imatrix, which took ~10min + ~20min to calulate on my machine.

full wiki.train would have taken 10h

for more info on imatrix handling see https://github.com/ggerganov/llama.cpp/pull/5302

ppl (512 wiki.test, 300chunks)

quant ppl (lower is better)
f16(baseline) 5.8839 +/- 0.05173
q8_0 5.8880 +/- 0.05178
q5_k_m 5.8912 +/- 0.05177
q5_k_m(without-imat) 5.8893 +/- 0.05174
q4_k_m 5.9248 +/- 0.05216
q4_k_m(without-imat) 5.9492 +/- 0.05249
iq3_xxs 6.1984 +/- 0.05475
iq3_xxs(only-wiki) 6.1796 +/- 0.05446
iq3_xxs(only-kal) 6.1984 +/- 0.05475
iq3_xxs(withou-imat) 6.4228 +/- 0.05756

Interesting observations

despite merge.imatrix being different from kalomaze-group_10_merged.172chunks.imatrix, they produce the exact same quantized iq3_xxs model file. (same hash, checked multiple times)

q5_k_m has a lower perplexity with the imatrix. but that probably is caused by kalomaze-group_10_merged diverging enough from wiki.

Downloads last month
431
GGUF
Model size
12.9B params
Architecture
llama

3-bit

4-bit

5-bit

8-bit

16-bit

Unable to determine this model's library. Check the docs .

Quantized from