NeMo
nvidia
shrimai19 commited on
Commit
745cc89
1 Parent(s): 48f5485

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +66 -3
README.md CHANGED
@@ -1,3 +1,66 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - nvidia
5
+ ---
6
+
7
+ ## Mistral-NeMo-12B-Instruct
8
+
9
+ [![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)[![Model size](https://img.shields.io/badge/Params-12B-green)](#model-architecture)[![Language](https://img.shields.io/badge/Language-Multilingual-green)](#datasets)
10
+
11
+ ### Model Overview:
12
+
13
+ Mistral-NeMo-12B-Instruct is a Large Language Model (LLM) composed of 12B parameters, trained jointly by NVIDIA and Mistral AI. It significantly outperforms existing models smaller or similar in size.
14
+
15
+ **Key features**
16
+ - Released under the Apache 2 License
17
+ - Pre-trained and instructed versions
18
+ - Trained with a 128k context window
19
+ - Comes with a FP8 quantized version with no accuracy loss
20
+ - Trained on a large proportion of multilingual and code data
21
+
22
+ ### Intended use
23
+
24
+ Mistral-NeMo-12B-Instruct is a chat model intended for use for the English language.
25
+
26
+ The instruct model itself can be further customized using the [NeMo Framework](https://docs.nvidia.com/nemo-framework/index.html) suite of customization tools including Parameter-Efficient Fine-Tuning (P-tuning, Adapters, LoRA, and more), and Model Alignment (SFT, SteerLM, RLHF, and more) using [NeMo-Aligner](https://github.com/NVIDIA/NeMo-Aligner). Refer to the [documentation](https://docs.nvidia.com/nemo-framework/user-guide/latest/llms/nemotron/index.html) for examples.
27
+
28
+ **Model Developer:** [NVIDIA](https://www.nvidia.com/en-us/) and [MistralAI](https://mistral.ai/)
29
+
30
+ **Model Dates:** Mistral-NeMo-12B-Instruct was trained between June 2024 and July 2024.
31
+
32
+ **Data Freshness:** The pretraining data has a cutoff of April 2024.
33
+
34
+ ### Model Architecture:
35
+
36
+ Mistral-NeMo-12B-Instruct is a transformer model, with the following architecture choices:
37
+
38
+ - Layers: 40
39
+ - Dim: 5,120
40
+ - Head dim: 128
41
+ - Hidden dim: 14,436
42
+ - Activation Function: SwiGLU
43
+ - Number of heads: 32
44
+ - Number of kv-heads: 8 (GQA)
45
+ - Rotary embeddings (theta = 1M)
46
+ - Vocabulary size: 2**17 ~= 128k
47
+
48
+ **Architecture Type:** Transformer Decoder (auto-regressive language model)
49
+
50
+ ### Evaluation Results
51
+
52
+
53
+ - MT Bench (dev): 7.84
54
+ - MixEval Hard: 0.534
55
+ - IFEval-v5: 0.629
56
+ - Wildbench: 42.57
57
+
58
+ ### Limitations
59
+
60
+ The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.
61
+
62
+
63
+ ### Ethical Considerations
64
+
65
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
66
+