Update README #6
Browse files
README.md
CHANGED
@@ -9,7 +9,7 @@ license_link: >-
|
|
9 |
## Model Overview
|
10 |
|
11 |
Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
|
12 |
-
It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size
|
13 |
Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
|
14 |
|
15 |
This model is ready for commercial use.
|
|
|
9 |
## Model Overview
|
10 |
|
11 |
Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
|
12 |
+
It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size and MLP intermediate dimension.
|
13 |
Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
|
14 |
|
15 |
This model is ready for commercial use.
|