Edit model card

Tinystories-30m-UL2

GPT-4 generated model card

Model Details

  • Model Name: crumb/opentinystories-30m-base
  • Model Type: GPTNeoXForCausalLM
  • Model Training Details: The model is trained using crumb/flan-ul2-tinystories which contains around a quarter of a million examples generated from Flan-UL2 (20b) with the prompt "Write a short story using the vocabulary of a first-grader."

Model Description

This model is trained with the specific purpose of generating short narratives using a vocabulary limited to the level of a first-grader. In terms of complexity and language usage, the model is designed to produce simplistic and easily comprehensible text.

Learning from text generated by Flan-UL2 (20b), the model adopts a simple storyline layout and a minimalistic vocabulary, which it recognizes are easier to learn and replicate.

Training

The model is trained for four epochs on the crumb/flan-ul2-tinystories dataset (inspired by roneneldan/TinyStories), created with the help of Flan-UL2 (20b), as opposed to GPT-3.5/4 in the original Tinystories. The data is designed to follow the format of a simple, first-grader-level narrative, which aids the model in learning simple vocabulary and sentence structure.

Training arguments:

per_device_train_batch_size=16,
gradient_accumulation_steps=8,
warmup_steps=128,
num_train_epochs=4,
learning_rate=2e-4,
eval_steps=64,
optim="adamw_torch",

Usage

This model serves as a meaningful research tool in exploring the learning tendencies of smaller language models and their ability to grasp simplified language constructs. Its specific training set effectively maps the idea that a constrained vocabulary and simplistic story layouts are inherently easier to learn.

Validation and Performance

The model's performance was evaluated using a held-out validation set, which constitutes 1% of the original dataset. During evaluation, the model achieved a loss of 2.284920. During training, the model achieved a loss of 2.647377

Downloads last month
17
Safetensors
Model size
36.9M params
Tensor type
F32
·
BOOL
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train crumb/opentinystories-30m-base