microsoft
/

lts-gpt2-sm

Text Generation

pareto-frontier

Model card Files Files and versions Community

gugarosa commited on Mar 20, 2023

Commit

0ef6675

·

1 Parent(s): df8e23b

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -72,7 +72,7 @@ print(tokenizer.decode(generated_ids[0], skip_special_tokens=True))
 ### Training Data
-The models have been trained for only 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset. Such number might imply in repetitive text when generating a large amount of tokens (32+ tokens).
 ### Training Procedure

 ### Training Data
+The models have been trained for only 7.8B tokens from [The Pile](https://huggingface.co/datasets/the_pile) dataset, which is roughy 10-15x times more than "Training Compute-Optimal Large Language Models" suggestion. However, since we are dealing with small-sized models (100M parameters less), there might be a chance of producing repetitive text when generating a large number of tokens given a short context.
 ### Training Procedure