R136a1
/

InfinityKumon-2x7B-GGUF

nsfw

Not-For-All-Audiences

Inference Endpoints

Model card Files Files and versions Community

InfinityKumon-2x7B

GGUF - Imatrix quant of InfinityKumon-2x7B

Another MoE merge from Endevor/InfinityRP-v1-7B and grimjim/kukulemon-7B.

The reason? Because I like InfinityRP-v1-7B so much and wondering if I can improve it even more by merging 2 great models into MoE.

Perplexity

Using llama.cpp/perplexity with private roleplay dataset.

Format	PPL
FP16	3.1748 +/- 0.11928
Q8_0	3.1734 +/- 0.11935
Q6_K	3.1752 +/- 0.11899
Q5_K_M	3.1731 +/- 0.11892
IQ4_NL	3.1752 +/- 0.11943
IQ3_M	3.1773 +/- 0.11528
Q2_K	3.2309 +/- 0.11996

I don't really recomend using Q2_K based on the ppl, the other quants are fine.

Prompt format:

Alpaca or ChatML

Switch: FP16 - GGUF

Downloads last month: 8

GGUF

Model size

12.9B params

Architecture

llama

2-bit

4-bit

5-bit

6-bit

8-bit

Inference API

Unable to determine this model's library. Check the docs .