---
datasets:
- allenai/MADLAD-400
---
The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.

We trained this model for the Kyrgyz language using dataset linked.


**Model Architecture**

Mistral-7B-v0.1 is a transformer model, with the following architecture choices:

    Grouped-Query Attention
    Sliding-Window Attention
    Byte-fallback BPE tokenizer


**Troubleshooting**

    If you see the following error:

KeyError: 'mistral'

    Or:

NotImplementedError: Cannot copy out of meta tensor; no data!

Ensure you are utilizing a stable version of Transformers, 4.34.0 or newer.


**Notice**

Mistral 7B is a pretrained base model and therefore does not have any moderation mechanisms.