mistral7b-base / README.md
Bam4d's picture
Small updates
f592c5f
|
raw
history blame
2.43 kB

Model Details

The Mistral AI-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral AI-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.

Model Developers Mistral AI.

Variations None.

Input Text only.

Output Text only.

Model Architecture Mistral AI-7B-v0.1 is a transformer model, with the following architecture choices:

  • Grouped-Query Attention
  • Sliding-Window Attention
  • Byte-fallback BPE tokenizer

Model Dates Mistral AI-7B-v0.1 was trained between June and September 2023.

Status This is a static model. Future models will have new version numbers.

License Apache 2.0 license.

Research Paper TODO: Coming soon.

Where to send questions or comments about the model TODO: How do people send comments?

Intended Use

Intended Use Cases Mistral AI-7B-v0.1 is for commercial and research use. It can be adapted for a variety of natural language generation tasks.

Evaluation Results

We report the standard benchmark results for Mistral AI-7B-v0.1. We use a custom evaluation library to produce the results.

Model Size hellaswag winogrande piqa boolq arc_easy arc_challenge naturalqs naturalqs_5shot triviaqa_5shot triviaqa humaneval_pass@1 mbpp_pass@1 mmlu math gsm8k
Mistral-7B-v0.1 7B 81.19% 75.53% 82.92% 83.52% 80.01% 55.38% 23.96% 28.92% 69.88% 63.22% 29.88% 47.86% 59.99% 11.94% 39.35%

Theme-based grouping

  • Commonsense Reasoning: 0-shot average of Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.

  • World Knowledge: 5-shot average of NaturalQuestions and TriviaQA.

  • Reading Comprehension: 0-shot average of BoolQ and QuAC.

  • Math: Average of 8-shot GSM8K with maj@8 and 4-shot MATH with maj@4

  • Code: Average of 0-shot Humaneval and 3-shot MBPP

  • Popular aggregated results: 5-shot MMLU, 3-shot BBH, and 3-5-shot AGI Eval (English multiple-choice questions only)

Ethical Considerations and Limitations

TODO: what do we say here?