crumb
/

opentinystories-30m-base

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

crumb commited on Jul 7, 2023

Commit

707eacf

•

1 Parent(s): 6320b76

Update README.md

Files changed (1) hide show

README.md +20 -3

README.md CHANGED Viewed

@@ -21,10 +21,27 @@ This model is trained with the specific purpose of generating short narratives u
 Learning from text generated by Flan-UL2 (20b), the model adopts a simple storyline layout and a minimalistic vocabulary, which it recognizes are easier to learn and replicate.
-## Training Data
-The model is trained for four epochs on the [crumb/flan-ul2-tinystories](https://huggingface.co/datasets/crumb/flan-ul2-tinystories) dataset, created with the help of Flan-UL2 (20b). The data is designed to follow the format of a simple, first-grader-level narrative, which aids the model in learning simple vocabulary and sentence structure.
 ## Usage
-This model serves as a meaningful research tool in exploring the learning tendencies of smaller language models and their ability to grasp simplified language constructs. Its specific training set effectively maps the idea that a constrained vocabulary and simplistic story layouts are inherently easier to learn.

 Learning from text generated by Flan-UL2 (20b), the model adopts a simple storyline layout and a minimalistic vocabulary, which it recognizes are easier to learn and replicate.
+## Training
+The model is trained for four epochs on the [crumb/flan-ul2-tinystories](https://huggingface.co/datasets/crumb/flan-ul2-tinystories) dataset (inspired by [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)), created with the help of Flan-UL2 (20b), as opposed to GPT-3.5/4 in the original Tinystories. The data is designed to follow the format of a simple, first-grader-level narrative, which aids the model in learning simple vocabulary and sentence structure.
+Training arguments:
+```
+per_device_train_batch_size=32,
+gradient_accumulation_steps=4,
+warmup_steps=128,
+num_train_epochs=4,
+learning_rate=2e-4,
+bf16=True,
+eval_steps=64,
+optim="adamw_torch",
+```
 ## Usage
+This model serves as a meaningful research tool in exploring the learning tendencies of smaller language models and their ability to grasp simplified language constructs. Its specific training set effectively maps the idea that a constrained vocabulary and simplistic story layouts are inherently easier to learn.
+## Validation and Performance
+The model's performance was thoroughly evaluated using a held-out validation set, which constitutes 1% of the original dataset. This validation set was chosen to provide an unbiased evaluation of the model's ability to generalize and to measure its performance on unseen data. During evaluation, the model achieved a loss of "".