26B?

#2
by FrenzyBiscuit - opened

Any plans for a 26B moe version? My users are demanding it!

I'll queue it for the oven lol

Thanks!

Thinking doesn't survive even 1 epoch of training on the Gutenberg 31B sadly. The 26B should survive though, I've never seen the 26B moe have broken thinking with my data.

Oh that's good feedback. I can imagine why this dataset with no reasoning would break the model's thinking. Will probably try some sort of reasoning repair from Claude traces on top of this next. (26B still next in line though haha)

Well to be fair 1.5-2 epoches breaks thinking on the base model as well.

I really need to get a reasoning repair dataset. I tried a lora from @Darkhn with no success sadly.

Sign up or log in to comment