--- license: apache-2.0 tags: - not-for-all-audiences - writing - roleplay - gguf - gguf-imatrix base_model: - nakodanei/Blue-Orchid-2x7b model_type: mixtral quantized_by: Green-Sky language: - en --- llama.cpp conversion of https://huggingface.co/nakodanei/Blue-Orchid-2x7b/ except for f16 and q8_0, every quant is using the `merge.imatrix` `merge.imatrix` is a merge of `kalomaze-group_10_merged.172chunks.imatrix` and `wiki.train.400chunks.imatrix`, which took ~10min + ~20min to calulate on my machine. full wiki.train would have taken 10h for more info on imatrix handling see https://github.com/ggerganov/llama.cpp/pull/5302 ### ppl (512 wiki.test, 300chunks) | quant | ppl (lower is better) | |--------------------|-----| | f16(baseline) | 5.8839 +/- 0.05173 | | q8_0 | 5.8880 +/- 0.05178 | | q5_k_m | 5.8912 +/- 0.05177 | | q5_k_m(without-imat) | 5.8893 +/- 0.05174 | | q4_k_m | 5.9248 +/- 0.05216 | | q4_k_m(without-imat) | 5.9492 +/- 0.05249 | | iq3_xxs | 6.1984 +/- 0.05475 | | iq3_xxs(only-wiki) | 6.1796 +/- 0.05446 | | iq3_xxs(only-kal) | 6.1984 +/- 0.05475 | | iq3_xxs(withou-imat) | 6.4228 +/- 0.05756 | ### Interesting observations despite `merge.imatrix` being different from `kalomaze-group_10_merged.172chunks.imatrix`, they produce the exact same quantized iq3_xxs model file. (same hash, checked multiple times) q5_k_m has a lower perplexity with the imatrix. but that probably is caused by kalomaze-group_10_merged diverging enough from wiki.