Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,51 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained jointly by Mistral AI and NVIDIA. It significantly outperforms existing models smaller or similar in size.
|
2 |
+
|
3 |
+
**Key features**
|
4 |
+
- Released under the Apache 2 License
|
5 |
+
- Pre-trained and instructed versions
|
6 |
+
- Trained with a 128k context window
|
7 |
+
- Trained on a large proportion of multilingual and code data
|
8 |
+
|
9 |
+
---
|
10 |
+
license: apache-2.0
|
11 |
+
---
|
12 |
+
|
13 |
+
---
|
14 |
+
Model Architecture
|
15 |
+
|
16 |
+
Mistral-NeMo is a transformer model, with the following architecture choices:
|
17 |
+
|
18 |
+
- Layers: 40
|
19 |
+
- Dim: 5,120
|
20 |
+
- Head dim: 128
|
21 |
+
- Hidden dim: 14,436
|
22 |
+
- Activation Function: SwiGLU
|
23 |
+
- Number of heads: 32
|
24 |
+
- Number of kv-heads: 8 (GQA)
|
25 |
+
- Rotary embeddings (theta = 1M)
|
26 |
+
- Vocabulary size: 2**17 ~= 128k
|
27 |
+
|
28 |
+
---
|
29 |
+
|
30 |
+
Main benchmarks
|
31 |
+
|
32 |
+
- HellaSwag (0-shot): 83.5%
|
33 |
+
- Winogrande (0-shot): 76.8%
|
34 |
+
- OpenBookQA (0-shot): 60.6%
|
35 |
+
- CommonSenseQA (0-shot): 70.4%
|
36 |
+
- TruthfulQA (0-shot): 50.3%
|
37 |
+
- MMLU (5-shot): 68.0%
|
38 |
+
- TriviaQA (5-shot): 73.8%
|
39 |
+
- NaturalQuestions (5-shot): 31.2%
|
40 |
+
|
41 |
+
Multilingual benchmarks
|
42 |
+
|
43 |
+
MMLU
|
44 |
+
- French: 62.3%
|
45 |
+
- German: 62.7%
|
46 |
+
- Spanish: 64.6%
|
47 |
+
- Italian: 61.3%
|
48 |
+
- Portuguese: 63.3%
|
49 |
+
- Russian: 59.2%
|
50 |
+
- Chinese: 59.0%
|
51 |
+
-Japanese: 59.0%
|