NeMo
Safetensors
llama
srvm commited on
Commit
a60471e
1 Parent(s): e28186c

Update README #6

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -9,7 +9,7 @@ license_link: >-
9
  ## Model Overview
10
 
11
  Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
12
- It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size, number of attention heads, and MLP intermediate dimension.
13
  Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
14
 
15
  This model is ready for commercial use.
 
9
  ## Model Overview
10
 
11
  Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
12
+ It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size and MLP intermediate dimension.
13
  Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
14
 
15
  This model is ready for commercial use.