A small opinion. (Now a long feedback thread!)

#1
by Diavator - opened

The model is really good, downloaded it GGUF from mradermacher. The model understands complex characters where aggression borders on vulnerability, usually such characters are called tsundere. Even the big models LLama 3 70b+, WizardLM 2 8*22 did not cope well with this char, as a result reducing his behaviour to the average. Thank you for your hard work!

That's great to hear! Pantheon's training is supposed to do exactly that, so I'm glad to hear it's working.

I'm liking it a lot as well after switching back to chatml format as you suggested. This improves upon the last model for sure - it's descriptive in all the right ways, overall noticeably smarter, better at knowing human anatomy and limitations, longer context, better at keeping track of details, and writes very well - better in basically every way to 1.0. I'd like to compare it with other nemo finetunes/ merges so far. Interestingly, while it struggles a little in picking up the gist, after giving it a bit of a nudge and edits on the initial step it really gets what you are getting at, goes off and is smart at pick up relevant details from thereon. This is kind of the opposite situation compared to other nemo finetunes that somehow seem like they are better at picking up what you want to get at initially, but fleshing out the details after that seem like more of a struggle and they would slowly revert to a neutral boring AI way of writing over time. I'm not sure why this happens but maybe others may know better, if this is indeed a real thing. I also needed to logit bias away variations of the word "dark" as it kept popping up despite the situation probably not being related at the time, but this is the only word that I had to do this to. That being said, it still sometimes mixes up some details and more frequently pronouns from he/she to you, some spelling errors, but it's all easily correctable enough and doesn't usually interfere with the responses as a whole, but it's probably a nemo/llm problem overall. There is a small consistency issue when the context gets a bit longer, and a mild repetition problem in mostly regards to formatting and paragraph structure. Regardless, overall this is excellent and one of if not the best nemo finetunes so far in my experience.

I noticed it liked to cling to a pattern of speech.
Eg.

I bet you ... . I bet you ... . I bet you ... . I bet you ... .

This is from one 107 token message with details removed x•x

106 token message:

I can just make... . I can make... . I can make... . I can make...

I noticed it liked to cling to a pattern of speech.
Eg.

I bet you ... . I bet you ... . I bet you ... . I bet you ... .

This is from one 107 token message with details removed x•x

I had a bit of that issue too until I switched to chatml, with rep penalty 1.05 and DRY 0.8/1.75/2/0. Not sure if you are already using those though, but if you aren't, see if that helps!

Sadly repetition is a typical issue with Mistral-trained models, and hard to get rid of.

I eventually decided on Nemo for the smarter brain it offers, but I do plan on following the same multi-stage finetuning sequence for a Llama 3.1 8B model to see how it compares.

Hat down to you sir, Professionalism, as always.

Gryphe, Thank you so much - this is the perfect model for me! <3

I'm liking it a lot as well after switching back to chatml format as you suggested. This improves upon the last model for sure - it's descriptive in all the right ways, overall noticeably smarter, better at knowing human anatomy and limitations, longer context, better at keeping track of details, and writes very well - better in basically every way to 1.0. I'd like to compare it with other nemo finetunes/ merges so far. Interestingly, while it struggles a little in picking up the gist, after giving it a bit of a nudge and edits on the initial step it really gets what you are getting at, goes off and is smart at pick up relevant details from thereon. This is kind of the opposite situation compared to other nemo finetunes that somehow seem like they are better at picking up what you want to get at initially, but fleshing out the details after that seem like more of a struggle and they would slowly revert to a neutral boring AI way of writing over time. I'm not sure why this happens but maybe others may know better, if this is indeed a real thing. I also needed to logit bias away variations of the word "dark" as it kept popping up despite the situation probably not being related at the time, but this is the only word that I had to do this to. That being said, it still sometimes mixes up some details and more frequently pronouns from he/she to you, some spelling errors, but it's all easily correctable enough and doesn't usually interfere with the responses as a whole, but it's probably a nemo/llm problem overall. There is a small consistency issue when the context gets a bit longer, and a mild repetition problem in mostly regards to formatting and paragraph structure. Regardless, overall this is excellent and one of if not the best nemo finetunes so far in my experience.

All my RPs have also been reduced from the format "he/she" to the response format - "you". There were problems with formatting, the model does not like to put speech in "dialogue" in inverted commas. reducing everything to the format: action dialogue action. But against the thought-provoking and deeply emotional responses of this model, such problems simply pale into insignificance.
I notice the model has a favourite swear word, in various forms: "Fuck me sideways with a (cactus)". The word in quotes varies regularly, I've already seen 3 variants with a rusty chainsaw, a cactus and a pillar.

I use the Silly Tavern, and this model writes very huge messages, even the token restrictions do not help. To load the model, I use the kobold cpp. Can anyone suggest a solution to the problem?

Hmm maybe you can try trimming incomplete sentences and tell it to be concise in the instruct prompt? I guess it depends on the person because some ppl prefer longer messages haha.

Hmm maybe you can try trimming incomplete sentences and tell it to be concise in the instruct prompt? I guess it depends on the person because some ppl prefer longer messages haha.

In my opinion, 1800 tokens is too much!Instead of RP, we get storytelling, albeit high-quality, but you also want to participate in it, and not just read.)))

The ones I get back are usually around 200-400 tokens. If you start a chat with the initial message being long and you let it add more and more though, I can see it snowballing onto a giant essay per response lol. Like with most models if you trim the responses down for the first few, it usually doesn't snowball more than that.

The ones I get back are usually around 200-400 tokens. If you start a chat with the initial message being long and you let it add more and more though, I can see it snowballing onto a giant essay per response lol. Like with most models if you trim the responses down for the first few, it usually doesn't snowball more than that.

Yes, that's exactly what happens. Apparently I'm so used to MLewd-ReMM-L2-Chat-20B and Noromaid-v0.4-Mixtral-Instruct-8x7b, I haven't noticed such problems with them. And my last RPs were on 70b+ models on together.ai, they may be verbose but they keep the instructions in size very well. It's a pity that no one will undertake to finish teaching WizardLM 2 8*22, I think that this model looks very advantageous against the background of LLAMA 3.

Hey all, fantastic to see all the back and forth going on in this thread - I haven't quite experienced any issues with message lengths myself in my extensive testing but I exclusively use GGUFs for that, which might result in a different result. The rebuilt persona dataset consists of full 4k dialogue examples and that could be a possible reason why it is biased towards producing longer content. (Or it's just Nemo.)

Either way, all your feedback (both positive and negative) is super valuable and I'm already brainstorming some ideas for the next iteration. Despite Nemo's shortcomings I'm going to continue using it a while longer simply because Llama 3.1 8B pales in comparison when it comes to complex roleplay.

Gryphe changed discussion title from A small opinion. to A small opinion. (Now a long feedback thread!)

Sign up or log in to comment