nvidia
/

Mistral-NeMo-12B-Base

Model card Files Files and versions Community

shrimai19 commited on Jul 18

Commit

9a02a14

•

1 Parent(s): 409547d

Update README.md

Files changed (1) hide show

README.md +51 -3

README.md CHANGED Viewed

@@ -1,3 +1,51 @@
----
-license: apache-2.0
----

+Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained jointly by Mistral AI and NVIDIA. It significantly outperforms existing models smaller or similar in size.
+**Key features**
+- Released under the Apache 2 License
+- Pre-trained and instructed versions
+- Trained with a 128k context window
+- Trained on a large proportion of multilingual and code data
+---
+license: apache-2.0
+---
+---
+Model Architecture
+Mistral-NeMo is a transformer model, with the following architecture choices:
+- Layers: 40
+- Dim: 5,120
+- Head dim: 128
+- Hidden dim: 14,436
+- Activation Function: SwiGLU
+- Number of heads: 32
+- Number of kv-heads: 8 (GQA)
+- Rotary embeddings (theta = 1M)
+- Vocabulary size: 2**17 ~= 128k
+---
+Main benchmarks
+- HellaSwag (0-shot): 83.5%
+- Winogrande (0-shot): 76.8%
+- OpenBookQA (0-shot): 60.6%
+- CommonSenseQA (0-shot): 70.4%
+- TruthfulQA (0-shot): 50.3%
+- MMLU (5-shot): 68.0%
+- TriviaQA (5-shot): 73.8%
+- NaturalQuestions (5-shot): 31.2%
+Multilingual benchmarks
+MMLU
+- French: 62.3%
+- German: 62.7%
+- Spanish: 64.6%
+- Italian: 61.3%
+- Portuguese: 63.3%
+- Russian: 59.2%
+- Chinese: 59.0%
+-Japanese: 59.0%