crumb commited on
Commit
707eacf
1 Parent(s): 6320b76

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -3
README.md CHANGED
@@ -21,10 +21,27 @@ This model is trained with the specific purpose of generating short narratives u
21
 
22
  Learning from text generated by Flan-UL2 (20b), the model adopts a simple storyline layout and a minimalistic vocabulary, which it recognizes are easier to learn and replicate.
23
 
24
- ## Training Data
25
 
26
- The model is trained for four epochs on the [crumb/flan-ul2-tinystories](https://huggingface.co/datasets/crumb/flan-ul2-tinystories) dataset, created with the help of Flan-UL2 (20b). The data is designed to follow the format of a simple, first-grader-level narrative, which aids the model in learning simple vocabulary and sentence structure.
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## Usage
29
 
30
- This model serves as a meaningful research tool in exploring the learning tendencies of smaller language models and their ability to grasp simplified language constructs. Its specific training set effectively maps the idea that a constrained vocabulary and simplistic story layouts are inherently easier to learn.
 
 
 
 
 
21
 
22
  Learning from text generated by Flan-UL2 (20b), the model adopts a simple storyline layout and a minimalistic vocabulary, which it recognizes are easier to learn and replicate.
23
 
24
+ ## Training
25
 
26
+ The model is trained for four epochs on the [crumb/flan-ul2-tinystories](https://huggingface.co/datasets/crumb/flan-ul2-tinystories) dataset (inspired by [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)), created with the help of Flan-UL2 (20b), as opposed to GPT-3.5/4 in the original Tinystories. The data is designed to follow the format of a simple, first-grader-level narrative, which aids the model in learning simple vocabulary and sentence structure.
27
+
28
+ Training arguments:
29
+
30
+ ```
31
+ per_device_train_batch_size=32,
32
+ gradient_accumulation_steps=4,
33
+ warmup_steps=128,
34
+ num_train_epochs=4,
35
+ learning_rate=2e-4,
36
+ bf16=True,
37
+ eval_steps=64,
38
+ optim="adamw_torch",
39
+ ```
40
 
41
  ## Usage
42
 
43
+ This model serves as a meaningful research tool in exploring the learning tendencies of smaller language models and their ability to grasp simplified language constructs. Its specific training set effectively maps the idea that a constrained vocabulary and simplistic story layouts are inherently easier to learn.
44
+
45
+ ## Validation and Performance
46
+
47
+ The model's performance was thoroughly evaluated using a held-out validation set, which constitutes 1% of the original dataset. This validation set was chosen to provide an unbiased evaluation of the model's ability to generalize and to measure its performance on unseen data. During evaluation, the model achieved a loss of "".