File size: 3,353 Bytes
2cdbbd8
 
 
 
 
 
 
 
2a434e9
 
 
 
 
 
 
 
 
 
e9bb729
2a434e9
1724f13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c5b6d17
1724f13
 
 
 
 
 
 
 
 
 
2a434e9
 
 
 
1724f13
2a434e9
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
---
license: mit
datasets:
- roneneldan/TinyStories
language:
- en
tags:
- text-generation-inference
---

# Simple Stories
Simple Stories will be a series of small text generation models trained on the [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) dataset.

The goal is to experiment with creating small language models that can perform highly specific tasks. In this case, the task is generating children's stories.

## Model Details
The model has 4M parameters (Safetensors seems to have inflated this to 13M, I will look into why in the future). This model has not been fine-tuned for instructions. It will simply spew out text when asked. I will be working on an instruct model in the coming days.

The model is a decoder only transformer model with 4 decoder layers and 2 attention heads. The model was trained for 3 epochs on only ~50MB of text and can already produce semi-coherent stories.

The code used to train the model can be found on my [github](https://github.com/broskicodes/slms).

## Usage
1. Import the relevant HuggingFace Auto classes and load model and tokenizer:

```
from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("broskicodes/simple-stories-4M", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("broskicodes/simple-stories-4M", trust_remote_code=True)
```
2. Tokenize your input sequence and call the `model.generate` function

```
inputs = tokenizer("Once upon a time,", return_tensors="pt", return_attention_mask=False)
outputs = model.model.generate(inputs['input_ids'], 250)
```

Note that we are calling `model.model.generate` not just `model.generate`

3. Decode the output and print the text

```
text = tokenizer.batch_decode(outputs)[0]
print(text)
```

## Sample
Here is a short sample generated by the model.

```Once upon a time, there was a little girl called Daisy. Daisy wanted to go to the park with her mommy. She packed some yummy food and chirpies and carried them . Daisy was so excited for her mommy to try. The puppy and Mommy brought a big spoon to make souping. Daisy loved swimming and jun ate until she was disappointed. They began to start playing in the garden. They gathered around and ate and boot into the bread . As Daisy got hungry on the grass, she found some magic. She read more to see what was Luckily, Daisy was very impressed. When the lady opened the pot, something tickling to another. It was a rare. Daisy was so happy that she gave the tunately. Daisy was no longer scared. She knew she had to tell Mommy at the store. She took her to the soup and opened the tasty hot chocolate. When Daisy gave it to Daisy and princessed around a special spoon every day.```

No, the story doesn't fully make sense. But most of the words are valid English and the characters and overarching plot are consistent. This is progress :)

## Going forward
The direct next step is creating a instruct model for interacting with and generating custom stories. After that I will continue working to improve the base model by increasing the amount of data it is trained on and continueing to experiment with different hyperparameters.

If you have any suggestions/questions, or you want to discuss anything about the model please reach out to me on twitter [@_broskitweets](https://twitter.com/_broskitweets).