TinyTale-15M

TinyTale-15M is a 15 Million parameter custom GPT-2 architecture trained completely from scratch on the TinyStories dataset. This model serves as a baseline pipeline experiment for handling custom architecture initialization, offline dataset tokenization, and processing on hardware acceleration via an NVIDIA A10 GPU workstation.

Model Architecture Specifications

  • Architecture: Custom GPT-2 (with Tied Weights)
  • Layers (Blocks): 6
  • Attention Heads: 6
  • Embedding Dimension: 384
  • Context Window: 256 tokens
  • Unique Trainable Parameters: ~15 Million (30.04M raw un-tied)

Training Hyperparameters

  • Hardware: 1x NVIDIA A10 GPU (24GB VRAM)
  • Precision: FP16 mixed precision
  • Batch Size: 64
  • Learning Rate: 5e-4 (with linear decay)
  • Training Steps: 3,000 steps
  • Dataset Subset: 200,000 unique stories from roneneldan/TinyStories

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("agentbyumer/TinyTale-15M")
model = AutoModelForCausalLM.from_pretrained("agentbyumer/TinyTale-15M")

prompt = "Once upon a time, a small puppy found a shiny key"
inputs = tokenizer(prompt, return_tensors="pt")

output = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.8)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Downloads last month
32
Safetensors
Model size
30M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for agentbyumer/TinyTale-15M

Finetuned
(2181)
this model
Quantizations
1 model

Dataset used to train agentbyumer/TinyTale-15M