--- datasets: - allenai/MADLAD-400 --- The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested. We trained this model for the Kyrgyz language using dataset linked. **Model Architecture** Mistral-7B-v0.1 is a transformer model, with the following architecture choices: Grouped-Query Attention Sliding-Window Attention Byte-fallback BPE tokenizer **Troubleshooting** If you see the following error: KeyError: 'mistral' Or: NotImplementedError: Cannot copy out of meta tensor; no data! Ensure you are utilizing a stable version of Transformers, 4.34.0 or newer. **Notice** Mistral 7B is a pretrained base model and therefore does not have any moderation mechanisms.