Iambe v2 vs Iambe v3

#2
by anmol989 - opened

In my initial testing the v3 model does not seem to follow instructions and keep generating story in a very random way. Will do a full comparison test in my free time tomorrow!

With the recent news of gemini it starting to feel open source LLM are very much lagging behind. I am not impressed by the open source LLM anymore and excitement is gone knowing that another model exist that compete with GPT 4

I hope Llama 3 comes soon and they decide to open source it unlike these companies like google and open ai

Thank you for your testing!

Yeah, cDPO has a value called beta, that controls how "wild" the model is, and I almost certainly set it too high. Works okay for RP or a loose story request, but not so well under structure.

I've forked Iambe, by the way. This RP line will be focused on RP responsiveness at the expense of story writing in the future, and Iambe Storyteller will be sacrificing RP ability for longer-form writing.

Thank you for your testing!

Yeah, cDPO has a value called beta, that controls how "wild" the model is, and I almost certainly set it too high. Works okay for RP or a loose story request, but not so well under structure.

I've forked Iambe, by the way. This RP line will be focused on RP responsiveness at the expense of story writing in the future, and Iambe Storyteller will be sacrificing RP ability for longer-form writing.

I spent a few hours yesterday on the RP V3 Q6 version, in the ERP part it maintains the detail advantage of storyteller over other models, RP wise I encountered the following problems:
Adherence to prompts is okay but not great, some details that are not emphasized get lost
In some characters, fixed repetitions (with slight variations) occur in the first few rounds of dialog, and the formatting and placement is almost identical, for example in the penultimate paragraph a sentence is repeated and slightly modified
Gradually repeating the topic over the course of 30-50 rounds of dialog, and changing the description from RP to Narrator
After 50 rounds it is very persistent in trying to repeat previous scenes (in a character that switches scenes frequently, the first few dozen rounds of dialog scenes are fairly random, but then it's almost always a repeat of the previous ones, with exactly the same storyline, and when my partner and I split up and try to act on our own it will always bring the storyline back to "My partner and I meet again! " and repeats the story line)

Interesting! I'm working on a dataset update for an Iambe-RP-v2, so this is invaluable feedback.

Thank you for your testing!

Yeah, cDPO has a value called beta, that controls how "wild" the model is, and I almost certainly set it too high. Works okay for RP or a loose story request, but not so well under structure.

I've forked Iambe, by the way. This RP line will be focused on RP responsiveness at the expense of story writing in the future, and Iambe >Storyteller will be sacrificing RP ability for longer-form writing.

I'm really glad someone is doing this and thinking this way!
I've noticed how a lot of people will mix RP and storywriting models together, but in using LLMs I've noticed there is a difference between an "RP reply" and a "story reply." I can be having a nice back-and-forth, 2-3 paragraph replies with intermixed quotes and narrative. Then suddenly I'll get a response that's 75% narrative, plaintext/several lines of unitalicised text with no quotes, and it looks and reads like a story. Often jumping to the end. Sometimes even WITH a "The End." Other times I'll get a "fair use disclaimer" or "content warning." Stuff that's come from data that contained these things (storywriting models, data from forum posts.)

Rather than having an LLM that "can do both," and ends up with results like this, I'd prefer models that focus solely on good RP or storywriting, exclusively. The RP one should be better at RP and the storywriting one better at storywriting that way.
Going forward, "cleaner" and "purer" data is the way forward. Focus on what's needed for the target use case and only use datasets which are formatted well for that purpose.

I suspect a lot of the weird artefacts we're seeing in models is because of inconsistently formatted datasets combined with mixing datasets intended for different purposes.

Sign up or log in to comment