jondurbin/bagel-dpo-34b-v0.5 · Fantastic Model, here are a few observations

I have been using this for a few weeks now, and I can say that it is amazing. It feels pretty much like an upgrade to 0.2 (witch is funny, because I could remember that the non dpo version if 0.2 was better), I did notice two aspects thou.

For context, I have been using a gguf version on Koboldcpp.

The first is that it has a rather strong recency bias (I hope that is the right term). Most notable, I made it write a dialogue between two characters, after 500 tokens character B asked character A something and A answered in the complete opposite way that she should have (and the question was about something fundamental to her character).

The second thing is that it really benefits from system prompts. Basic instruct mode works fine but other modes will flat out not function without one. When I was using chat mode it stared dropping the replay halfway throw and started writing something else. Like: “Yes, I know this I is probably a bad idAnd so the story unfolds around…

Simply putting something like [You are being used as a chat bot] into memory made it work fine, still this is not an issue that 0.2 had from my experience. I will say thou that my usage of other modes is low, so you might want to get a second opinion for this part.