File size: 2,404 Bytes
ddc2436 85b4b66 707eacf 85b4b66 707eacf 85b4b66 707eacf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 |
---
license: mit
datasets:
- crumb/flan-ul2-tinystories
language:
- en
---
# Tinystories-30m-UL2
*GPT-4 generated model card*
## Model Details
- **Model Name**: GPTNeoX/flan-ul2-tinystories
- **Model Type**: GPTNeoXForCausalLM (Language Modeling)
- **Model Training Details**: The model is trained using [crumb/flan-ul2-tinystories](https://huggingface.co/datasets/crumb/flan-ul2-tinystories) which contains around a quarter of a million examples generated from Flan-UL2 (20b) with the prompt "Write a short story using the vocabulary of a first-grader."
## Model Description
This model is trained with the specific purpose of generating short narratives using a vocabulary limited to the level of a first-grader. In terms of complexity and language usage, the model is designed to produce simplistic and easily comprehensible text.
Learning from text generated by Flan-UL2 (20b), the model adopts a simple storyline layout and a minimalistic vocabulary, which it recognizes are easier to learn and replicate.
## Training
The model is trained for four epochs on the [crumb/flan-ul2-tinystories](https://huggingface.co/datasets/crumb/flan-ul2-tinystories) dataset (inspired by [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)), created with the help of Flan-UL2 (20b), as opposed to GPT-3.5/4 in the original Tinystories. The data is designed to follow the format of a simple, first-grader-level narrative, which aids the model in learning simple vocabulary and sentence structure.
Training arguments:
```
per_device_train_batch_size=32,
gradient_accumulation_steps=4,
warmup_steps=128,
num_train_epochs=4,
learning_rate=2e-4,
bf16=True,
eval_steps=64,
optim="adamw_torch",
```
## Usage
This model serves as a meaningful research tool in exploring the learning tendencies of smaller language models and their ability to grasp simplified language constructs. Its specific training set effectively maps the idea that a constrained vocabulary and simplistic story layouts are inherently easier to learn.
## Validation and Performance
The model's performance was thoroughly evaluated using a held-out validation set, which constitutes 1% of the original dataset. This validation set was chosen to provide an unbiased evaluation of the model's ability to generalize and to measure its performance on unseen data. During evaluation, the model achieved a loss of "". |