Edit model card

This is a dumb experiment - don't expect it to be good!

I merged a few Mixtral models together then tuned only the routing parameters. There was a pretty steep drop in loss with only a bit of training - went from ~0.99 to ~.7 over about ten million tokens.

I'm hoping this after-the-fact balancing will have reduced some of the nasty behavior typical of current tunes. But maybe it just made it even dumber! We'll see.

Uses ChatML format.

Will update with more details if it turns out promising.

Downloads last month
2,984
Safetensors
Model size
46.7B params
Tensor type
BF16
·

Dataset used to train chargoddard/mixtralmerge-8x7B-rebalanced-test