Results of testing.

#1
by Rance47 - opened

IMHO model seems a smarter when it comes to accessing obesecure and less known data like biography of film characters or plots of books, compared to MiquMaid-v2-70B and Midnight Miqu, but also became more censorious and opinionated than the other two in my experience. To the point when even under jailbreaks it influences the output at times. Perhaps loses to midnight miqu when it comes to logic a bit. But only a bit.

Thanks for your feedback!

Any 120Bs worth testing this new merge method on? Goliath/Miqu maybe? Dolphin?

@Iommed I tried SLERP merging Xwin and Euryale with Miqu Midnight-Miqu style and then merging them together into Miqu-Goliath. The results of my tests were a bit disappointing:
image.png
While the created models became significantly better at stylized writing(S column), they lost the ability to write poems about difficult topics(P column). Miqu's positivity bias ruins the original charm of goliath imo. I am not planning on uploading the weights, so if @Undi95 or someone else wants to recreate what I did and upload it, go ahead.

@Iommed I tried SLERP merging Xwin and Euryale with Miqu Midnight-Miqu style and then merging them together into Miqu-Goliath. The results of my tests were a bit disappointing:
image.png
While the created models became significantly better at stylized writing(S column), they lost the ability to write poems about difficult topics(P column). Miqu's positivity bias ruins the original charm of goliath imo. I am not planning on uploading the weights, so if @Undi95 or someone else wants to recreate what I did and upload it, go ahead.

The problem with Miqu is that it use another rope theta value than others Llama2 models, so if you merge Llama2 in Miqu, most of the time it will be shitty.

The only way I found to fix this a little is to really put the least amount of Llama2 model in it, and preferably using fine tune made on Miqu directly for merging.

The other problem with Miqu is that the FP16 come from a Q5 and I have a really hard time finetuning it precisely, plus, some LASER-RMT tools give an "infinity" perplexity value on some part of some layers.

So yeah, I don't think I will do that either unstable shit I already tried on my side hahaha.

It's the same shit I currently experience with Mistral 0.1 and Mistral 0.2, sometime I can merge 0.2 with old 0.1 model and it work, sometime it's just garbage.

I tested Miqu-MS's UGI and it was one of the most knowledgeable local models I've tested.

Sidenote, I've tested over a dozen Miqu fine-tunes, but none have them been fully uncensored yet. They all have refused to answer at least 1 question I ask (highest W/10 for them is 8). I guess Miqu has certain strongly engrained refusals, like Qwen (but not as bad).

I tested Miqu-MS's UGI and it was one of the most knowledgeable local models I've tested.

Sidenote, I've tested over a dozen Miqu fine-tunes, but none have them been fully uncensored yet. They all have refused to answer at least 1 question I ask (highest W/10 for them is 8). I guess Miqu has certain strongly engrained refusals, like Qwen (but not as bad).

Yooo, nice!
Thanks for the test!

Sign up or log in to comment