Best prose in a model I've ever seen.

#5
by Dsol58 - opened

I have been following along with llms since the llama 2 days and its amazing to see how far they've come. I've always been a big fan of your models and prefer to use larger models around 70b and 123b. I have to say that this version has demonstrated some of the best prose for and above its weight class, even at Min P 0.1. The only drawback of this model being its size and metaknowledge, one perk of llama models has always been its ability to draw from random outside sources even without character prompting, which would be great if it wasnt for its gptisms. If it were ever possible to replicate this with 22b or 123b, I'd be stoked to see it.

I have UnslopSmall v1 in my page but I'm not sure if it's as good. Nemo and Small have different archs.

haven't put my finger on it but yes unslop small v1 leaves much to be desired, if I get the change I could try to compare them.

I have UnslopSmall v1 in my page but I'm not sure if it's as good. Nemo and Small have different archs.

I tried the official Cydonia versions, the 2k, the 2l, and unslopsmall. To me, Cydonia 2k felt the best, subsequent versions are very noticeably weaker. (all in Q6 precision for reference)
edit: to be fair, I didn't test 2m/unslop as thoroughly as the others yet, at this point it's more a feeling than a fact.

I also feel like not turning Metharme keywords into tokens is really hurting the model. If you look at it, metharme format is very similar to Mistral (with a system token on top) in its structure.
[inst]user input[/inst]model output (/s token)
<|user|>user input<|model|>model output (/s token)
Not leveraging on that fact seems like such a waste during the fine tuning. if I know anything about Mistral's various architectures, it's that they are in love with very consistent formatting.

Sign up or log in to comment