TinyStories-45M

A 45-million parameter language model trained entirely on the TinyStories dataset for creative story generation. This model follows the LLaMA architecture with grouped query attention (GQA) and is optimized for short-form narrative text.

Model Details

Attribute Value
Architecture LLaMA-style (decoder-only transformer)
Parameters 45.46M
Hidden Size 512
Layers 13
Attention Heads 8
KV Heads (GQA) 4
Intermediate Size 1344
Vocab Size 16384
Context Length 512
Tied Embeddings Yes

Training

Pretraining

  • Dataset: roneneldan/TinyStories
  • Epochs: 3
  • Effective Batch Size: 128
  • Learning Rate: 5e-4 with cosine decay
  • Warmup: 1%
  • Weight Decay: 0.1
  • Precision: FP16
  • Optimizer: AdamW

Supervised Fine-Tuning (SFT)

  • Dataset: roneneldan/TinyStoriesInstruct
  • Epochs: 1
  • Learning Rate: 1e-4
  • Loss Masking: Assistant-only (only compute loss on story completion)

Tokenizer

  • Type: SentencePiece Unigram
  • Vocab Size: 16,384
  • Special Tokens: <pad>, <eos>, <bos>, <unk>, <|im_end|>

Evaluation

Metric Value
Validation Loss 0.829051066686119
Perplexity 2.2911436557769775

50-Prompt Inference

See evaluation/50_prompts.json for generated story samples.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("razor5050/TinyStories-45M")
tokenizer = AutoTokenizer.from_pretrained("razor5050/TinyStories-45M")

prompt = "Features: a brave cat\nWords: moon, adventure\nSummary: A cat goes on a moon adventure\nStory:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Hardware

  • Training GPU: NVIDIA RTX 3060 12GB
  • Training Time: ~8-10 hours (pretrain + SFT)

Citation

@dataset{roneneldan2023tinystories,
  title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
  author={Ronen Eldan and Yuanzhi Li},
  year={2023}
}

Generated: 2026-05-20 18:37:02

Downloads last month
482
Safetensors
Model size
45.5M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for razor5050/TinyStories-45M

Unable to build the model tree, the base model loops to the model itself. Learn more.

Datasets used to train razor5050/TinyStories-45M