BramVanroy commited on
Commit
1e5a668
1 Parent(s): b106433

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -1
README.md CHANGED
@@ -33,7 +33,8 @@ wanted to see if the performance would be reasonable after finetuning this model
33
  Trained on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) for one epoch. The canonical
34
  validation split was not used but instead 5% of `train` was used as validation.
35
 
36
- At 2048 tokens context length, the training set was around 2M (2,008,858) samples, and the model was trained for 1 epoch.
 
37
 
38
 
39
  ## Training procedure
 
33
  Trained on the [yhavinga/mc4_nl_cleaned](https://huggingface.co/datasets/yhavinga/mc4_nl_cleaned/viewer/tiny/train) dataset (`tiny` partition) for one epoch. The canonical
34
  validation split was not used but instead 5% of `train` was used as validation.
35
 
36
+ At 2048 tokens context length, the training set was around 2M (2,008,858) samples, and the model was trained for 1 epoch. That means that the model was trained for
37
+ around 4B Dutch tokens (`2048 * 2008858 = 4.114.141.184`).
38
 
39
 
40
  ## Training procedure