For the bandwidth limited ones <3

GGUFs for HanNayeoniee/LHK_DPO_v1

For a general representation of how quantization level influences output quality, check any model card from TheBloke, or see this table. Note those benchmarks were done on Llama models, and are probably not recent. Also I don't know how the MOE architecture influences those results but you got the idea!

So about the model, I just played with it 40min so far (Q5_K_M, ChatML template, TGWUI, ratherly short context size) but from what I saw, this model was really impressive 👏 I should rather say quite astonishing!

[Edit: every quants are now tested and validated]

The coherence seems remarkably well maintained. To illustrate, see this sequence of interactions with the model.

HanNayeoniee/LHK_DPO_v1 was trained via Direct Preference Optimization(DPO) from TomGrc/FusionNet_7Bx2_MoE_14B.

Thanks for the community and sincere congrats to HanNayeoniee and TomGrc!