LlaMoE-Medium model image

This is a 4x8b Llama Mixture of Experts (MoE) model. It was trained on OpenHermes Resort from the Dolphin-2.9 dataset. The model is a combination of 4 Llama fine-tunes, using DeepSpeed-MoE's architecture. All experts are active for every token. This is a VERY good model, somewhere in between 8B and Llama 70B in capability. Enjoy! Thank you to: CrusoeEnergy for sponsoring the compute for this project My collaborators Eric Hartford and Fernando (has too many names) Neto