nvidia
/

Llama-3.1-Minitron-4B-Width-Base

Model card Files Files and versions Community

srvm commited on Aug 20

Commit

a60471e

•

1 Parent(s): e28186c

Update README #6

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -9,7 +9,7 @@ license_link: >-
 ## Model Overview
 Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
-It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size, number of attention heads, and MLP intermediate dimension.
 Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
 This model is ready for commercial use.

 ## Model Overview
 Llama-3.1-Minitron-4B-Width-Base is a base text-to-text model that can be adopted for a variety of natural language generation tasks.
+It is obtained by pruning Llama-3.1-8B; specifically, we prune model embedding size and MLP intermediate dimension.
 Following pruning, we perform continued training with distillation using 94 billion tokens to arrive at the final model; we use the continuous pre-training data corpus used in Nemotron-4 15B for this purpose.
 This model is ready for commercial use.