broskicodes
/

simple-stories-4M

Text Generation

simple_stories_4m

text-generation-inference

Model card Files Files and versions Community

broskicodes commited on Jan 6

Commit

e9bb729

•

1 Parent(s): 2a434e9

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -16,7 +16,7 @@ The goal is to experiment with creating small language models that can perform h
 ## Model Details
 The model has 4M parameters (Safetensors seems to have inflated this to 13M, I will look into why in the future). This model has not been fine-tuned for instructions. It will simply spew out text when asked. I will be working on an instruct model in the coming days.
-The model is a decoder only transformer model with 4 decoder layers and 2 attention heads. The model was trained on only ~50MB of text and can already produce semi-coherent stories.
 The code used to train the model can be found on my [github](https://github.com/broskicodes/slms). For now, this is also the only way to train and obtain the tokenizer necessary for encoding and decoding text. Check it out if you are interested.

 ## Model Details
 The model has 4M parameters (Safetensors seems to have inflated this to 13M, I will look into why in the future). This model has not been fine-tuned for instructions. It will simply spew out text when asked. I will be working on an instruct model in the coming days.
+The model is a decoder only transformer model with 4 decoder layers and 2 attention heads. The model was trained for 3 epochs on only ~50MB of text and can already produce semi-coherent stories.
 The code used to train the model can be found on my [github](https://github.com/broskicodes/slms). For now, this is also the only way to train and obtain the tokenizer necessary for encoding and decoding text. Check it out if you are interested.