|
--- |
|
datasets: |
|
- allenai/MADLAD-400 |
|
--- |
|
The Mistral-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested. |
|
|
|
We trained this model for the Kyrgyz language using dataset linked. |
|
|
|
|
|
|
|
**Model Architecture** |
|
|
|
Mistral-7B-v0.1 is a transformer model, with the following architecture choices: |
|
|
|
Grouped-Query Attention |
|
Sliding-Window Attention |
|
Byte-fallback BPE tokenizer |
|
|
|
|
|
**Troubleshooting** |
|
|
|
If you see the following error: |
|
|
|
KeyError: 'mistral' |
|
|
|
Or: |
|
|
|
NotImplementedError: Cannot copy out of meta tensor; no data! |
|
|
|
Ensure you are utilizing a stable version of Transformers, 4.34.0 or newer. |
|
|
|
|
|
**Notice** |
|
|
|
Mistral 7B is a pretrained base model and therefore does not have any moderation mechanisms. |
|
|