Edit model card

Built with Axolotl

QLoRA tuned from mistralai/Mixtral-8x7B-v0.1.

My main reason for training this model was to investigate using an altered router balancing loss combined with the z-loss introduced in ST-MoE: Designing Stable and Transferable Sparse Expert Models. The result is pretty decent, I think! It does a good job of respecting character information in system prompts and performed adequately on a few simple coding tasks.

To train this I used a custom branch of Transformers that adds z-loss and reimplements the router balancing loss based on the version in MegaBlocks. The config used with my custom hacked-up branch of axolotl is available here.

Uses my favorite non-ChatML token-economic chat prompt format. Messages should be prefixed with " ***System:", " ***Query:", or " ***Response:" for system, user, and model messages respectively. No newlines are necessary but the space before the triple asterisk is mandatory.

Downloads last month
1,891
Safetensors
Model size
46.7B params
Tensor type
BF16
·

Finetuned from

Datasets used to train chargoddard/MixtralRPChat-ZLoss