NeMo
nvidia
shrimai19 commited on
Commit
9a02a14
1 Parent(s): 409547d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -3
README.md CHANGED
@@ -1,3 +1,51 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained jointly by Mistral AI and NVIDIA. It significantly outperforms existing models smaller or similar in size.
2
+
3
+ **Key features**
4
+ - Released under the Apache 2 License
5
+ - Pre-trained and instructed versions
6
+ - Trained with a 128k context window
7
+ - Trained on a large proportion of multilingual and code data
8
+
9
+ ---
10
+ license: apache-2.0
11
+ ---
12
+
13
+ ---
14
+ Model Architecture
15
+
16
+ Mistral-NeMo is a transformer model, with the following architecture choices:
17
+
18
+ - Layers: 40
19
+ - Dim: 5,120
20
+ - Head dim: 128
21
+ - Hidden dim: 14,436
22
+ - Activation Function: SwiGLU
23
+ - Number of heads: 32
24
+ - Number of kv-heads: 8 (GQA)
25
+ - Rotary embeddings (theta = 1M)
26
+ - Vocabulary size: 2**17 ~= 128k
27
+
28
+ ---
29
+
30
+ Main benchmarks
31
+
32
+ - HellaSwag (0-shot): 83.5%
33
+ - Winogrande (0-shot): 76.8%
34
+ - OpenBookQA (0-shot): 60.6%
35
+ - CommonSenseQA (0-shot): 70.4%
36
+ - TruthfulQA (0-shot): 50.3%
37
+ - MMLU (5-shot): 68.0%
38
+ - TriviaQA (5-shot): 73.8%
39
+ - NaturalQuestions (5-shot): 31.2%
40
+
41
+ Multilingual benchmarks
42
+
43
+ MMLU
44
+ - French: 62.3%
45
+ - German: 62.7%
46
+ - Spanish: 64.6%
47
+ - Italian: 61.3%
48
+ - Portuguese: 63.3%
49
+ - Russian: 59.2%
50
+ - Chinese: 59.0%
51
+ -Japanese: 59.0%