NeMo
Safetensors
llama
srvm commited on
Commit
40d82bc
1 Parent(s): 9f317d2

Add link to tech report

Browse files
Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -9,7 +9,7 @@ license_link: >-
9
  ## Model Overview
10
 
11
  Llama-3.1-Minitron-4B-Depth-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
12
- It is obtained by pruning Llama-3.1-8B; specifically, we prune the number of transformer blocks in the model. Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
13
 
14
  This model is ready for commercial use.
15
 
@@ -137,4 +137,5 @@ NVIDIA believes Trustworthy AI is a shared responsibility and we have establishe
137
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
138
 
139
  ## References
140
- * [Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
 
 
9
  ## Model Overview
10
 
11
  Llama-3.1-Minitron-4B-Depth-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
12
+ It is obtained by pruning Llama-3.1-8B; specifically, we prune the number of transformer blocks in the model. Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose. Please refer to our [technical report](https://arxiv.org/abs/2408.11796) for more details.
13
 
14
  This model is ready for commercial use.
15
 
 
137
  Please report security vulnerabilities or NVIDIA AI Concerns [here](https://www.nvidia.com/en-us/support/submit-security-vulnerability/).
138
 
139
  ## References
140
+ * [Compact Language Models via Pruning and Knowledge Distillation](https://arxiv.org/abs/2407.14679)
141
+ * [LLM Pruning and Distillation in Practice: The Minitron Approach](https://arxiv.org/abs/2408.11796)