Bam4d commited on
Commit
eee9341
1 Parent(s): f592c5f

Update small model card

Browse files
Files changed (1) hide show
  1. README.md +13 -44
README.md CHANGED
@@ -1,52 +1,21 @@
1
- # **Model Details**
 
 
 
2
 
3
- The Mistral AI-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters. Mistral AI-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.
4
 
5
- **Model Developers** Mistral AI.
 
6
 
7
- **Variations** None.
8
 
9
- **Input** Text only.
10
-
11
- **Output** Text only.
12
-
13
- **Model Architecture** Mistral AI-7B-v0.1 is a transformer model, with the following architecture choices:
14
  - Grouped-Query Attention
15
  - Sliding-Window Attention
16
  - Byte-fallback BPE tokenizer
17
 
18
- **Model Dates** Mistral AI-7B-v0.1 was trained between June and September 2023.
19
-
20
- **Status** This is a static model. Future models will have new version numbers.
21
-
22
- **License** Apache 2.0 license.
23
-
24
- **Research Paper** TODO: Coming soon.
25
-
26
- **Where to send questions or comments about the model** TODO: How do people send comments?
27
-
28
- # **Intended Use**
29
- **Intended Use Cases** Mistral AI-7B-v0.1 is for commercial and research use. It can be adapted for a variety of natural language generation tasks.
30
-
31
- # **Evaluation Results**
32
- We report the standard benchmark results for Mistral AI-7B-v0.1. We use a custom evaluation library to produce the results.
33
-
34
- | Model | Size | hellaswag | winogrande | piqa | boolq | arc_easy | arc_challenge | naturalqs | naturalqs_5shot | triviaqa_5shot | triviaqa | humaneval_pass@1 | mbpp_pass@1 | mmlu | math | gsm8k |
35
- |-----------------|------|-----------|------------|--------|--------|----------|---------------|-----------|-----------------|----------------|----------|------------------|-------------|--------|--------|--------|
36
- | Mistral-7B-v0.1 | 7B | 81.19% | 75.53% | 82.92% | 83.52% | 80.01% | 55.38% | 23.96% | 28.92% | 69.88% | 63.22% | 29.88% | 47.86% | 59.99% | 11.94% | 39.35% |
37
-
38
- **Theme-based grouping**
39
- - Commonsense Reasoning: 0-shot average of Hellaswag, Winogrande, PIQA, SIQA, OpenbookQA, ARC-Easy, ARC-Challenge, and CommonsenseQA.
40
-
41
- - World Knowledge: 5-shot average of NaturalQuestions and TriviaQA.
42
-
43
- - Reading Comprehension: 0-shot average of BoolQ and QuAC.
44
-
45
- - Math: Average of 8-shot GSM8K with maj@8 and 4-shot MATH with maj@4
46
-
47
- - Code: Average of 0-shot Humaneval and 3-shot MBPP
48
-
49
- - Popular aggregated results: 5-shot MMLU, 3-shot BBH, and 3-5-shot AGI Eval (English multiple-choice questions only)
50
-
51
- # **Ethical Considerations and Limitations**
52
- TODO: what do we say here?
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: text-generation
4
+ ---
5
 
6
+ # Model Card for Mistral-7B-v0.1
7
 
8
+ The Mistral AI-7B-v0.1 Large Language Model (LLM) is a pretrained generative text model with 7 billion parameters.
9
+ Mistral AI-7B-v0.1 outperforms Llama 2 13B on all benchmarks we tested.
10
 
11
+ For full details of this model please read our [Release blog post](https://mistral.ai/news/announcing-mistral-7b-v0.1/)
12
 
13
+ ## Model Architecture
14
+ Mistral AI-7B-v0.1 is a transformer model, with the following architecture choices:
 
 
 
15
  - Grouped-Query Attention
16
  - Sliding-Window Attention
17
  - Byte-fallback BPE tokenizer
18
 
19
+ ## Model Developers
20
+ The Mistral AI Team:
21
+ Albert Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Teven Le Scao, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed.