LoliCore 1B
This is a very small MoE (Mixture Of Expert) model that I will experiment with in different MLP settings. Particularly in this repo I used a Jump module (passing the hidden state directly to the next layer) to test if it will work in MoE.