File size: 2,404 Bytes
ddc2436
 
 
 
 
 
 
85b4b66
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
707eacf
85b4b66
707eacf
 
 
 
 
 
 
 
 
 
 
 
 
 
85b4b66
 
 
707eacf
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
---
license: mit
datasets:
- crumb/flan-ul2-tinystories
language:
- en
---
# Tinystories-30m-UL2

*GPT-4 generated model card*

## Model Details

- **Model Name**: GPTNeoX/flan-ul2-tinystories
- **Model Type**: GPTNeoXForCausalLM (Language Modeling)
- **Model Training Details**: The model is trained using [crumb/flan-ul2-tinystories](https://huggingface.co/datasets/crumb/flan-ul2-tinystories) which contains around a quarter of a million examples generated from Flan-UL2 (20b) with the prompt "Write a short story using the vocabulary of a first-grader."

## Model Description

This model is trained with the specific purpose of generating short narratives using a vocabulary limited to the level of a first-grader. In terms of complexity and language usage, the model is designed to produce simplistic and easily comprehensible text.

Learning from text generated by Flan-UL2 (20b), the model adopts a simple storyline layout and a minimalistic vocabulary, which it recognizes are easier to learn and replicate.

## Training

The model is trained for four epochs on the [crumb/flan-ul2-tinystories](https://huggingface.co/datasets/crumb/flan-ul2-tinystories) dataset (inspired by [roneneldan/TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories)), created with the help of Flan-UL2 (20b), as opposed to GPT-3.5/4 in the original Tinystories. The data is designed to follow the format of a simple, first-grader-level narrative, which aids the model in learning simple vocabulary and sentence structure.

Training arguments:

```
per_device_train_batch_size=32,
gradient_accumulation_steps=4,
warmup_steps=128,
num_train_epochs=4,
learning_rate=2e-4,
bf16=True,
eval_steps=64,
optim="adamw_torch",
```

## Usage

This model serves as a meaningful research tool in exploring the learning tendencies of smaller language models and their ability to grasp simplified language constructs. Its specific training set effectively maps the idea that a constrained vocabulary and simplistic story layouts are inherently easier to learn.

## Validation and Performance

The model's performance was thoroughly evaluated using a held-out validation set, which constitutes 1% of the original dataset. This validation set was chosen to provide an unbiased evaluation of the model's ability to generalize and to measure its performance on unseen data. During evaluation, the model achieved a loss of "".