Update README.md
Browse files
README.md
CHANGED
@@ -72,7 +72,7 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
|
|
72 |
|
73 |
### Training Data
|
74 |
|
75 |
-
The models have been trained for only 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset.
|
76 |
|
77 |
### Training Procedure
|
78 |
|
|
|
72 |
|
73 |
### Training Data
|
74 |
|
75 |
+
The models have been trained for only 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset, which is roughy 10-15x times more than "Training Compute-Optimal Large Language Models" suggestion. However, since we are dealing with small-sized models (100M parameters less), there might be a chance of producing repetitive text when generating a large number of tokens given a short context.
|
76 |
|
77 |
### Training Procedure
|
78 |
|