### Micro Mistral | |
This is a small mistral model with 6 layers | |
It is similar to smol llama varaints uses GQA and tied embeddings. | |
Except it uses mistral style arch with GQA and sliding window attention | |
This architecture takes GQA and tied embeddings to create an effeceint 0.5B model that uses the mistral architecture(It is supported in downstream applications) | |
#### Dataset | |
Minipile | |
Instruct | |
Math | |
OpenOrca | |
Synthetic Data | |
TODO: Complete Dataset section | |