rwitz/go-bruins-v2 · Perhaps You Should Use a Different Base Model For Go Bruins

This was already A MetaMath, Cybertron & Starling Slerp merge, with Cybertron having lots of DPO training.

Further training might have forced the scores up a tiny bit, but it's performing worse than the aforementioned @Q-bert slerp. For example, it's hallucinating more at the fringes of knowledge and ignoring story prompts more (partly by adding NSFW to the stories).

I'm not a prude. It's just that the last fine-tuning kinda takes over, so taking what is currently the best overall performing LLM and adding NSFW DPO is far from ideal. It would turn out better if one of the three models added such content prior to being merged. And since NSFW role play doesn't require a smart model why add it to the top performer so it stick out at the top of the leaderboard?

Merge: https://huggingface.co/Q-bert/MetaMath-Cybertron-Starling

Metamath: https://huggingface.co/meta-math/MetaMath-Mistral-7B
Cybertron: https://huggingface.co/fblgit/una-cybertron-7b-v2-bf16
Starling: https://huggingface.co/berkeley-nest/Starling-LM-7B-alpha