gugarosa commited on
Commit
0ef6675
1 Parent(s): df8e23b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -72,7 +72,7 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
72
 
73
  ### Training Data
74
 
75
- The models have been trained for only 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset. Such number might imply in repetitive text when generating a large amount of tokens (32+ tokens).
76
 
77
  ### Training Procedure
78
 
 
72
 
73
  ### Training Data
74
 
75
+ The models have been trained for only 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset, which is roughy 10-15x times more than "Training Compute-Optimal Large Language Models" suggestion. However, since we are dealing with small-sized models (100M parameters less), there might be a chance of producing repetitive text when generating a large number of tokens given a short context.
76
 
77
  ### Training Procedure
78