TinyTale-15M

TinyTale-15M is a 15 Million parameter custom GPT-2 architecture trained completely from scratch on the TinyStories dataset. This model serves as a baseline pipeline experiment for handling custom architecture initialization, offline dataset tokenization, and processing on hardware acceleration via an NVIDIA A10 GPU workstation.

Model Architecture Specifications

Architecture: Custom GPT-2 (with Tied Weights)
Layers (Blocks): 6
Attention Heads: 6
Embedding Dimension: 384
Context Window: 256 tokens
Unique Trainable Parameters: ~15 Million (30.04M raw un-tied)

Training Hyperparameters

Hardware: 1x NVIDIA A10 GPU (24GB VRAM)
Precision: FP16 mixed precision
Batch Size: 64
Learning Rate: 5e-4 (with linear decay)
Training Steps: 3,000 steps
Dataset Subset: 200,000 unique stories from roneneldan/TinyStories

How to Use

from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("agentbyumer/TinyTale-15M")
model = AutoModelForCausalLM.from_pretrained("agentbyumer/TinyTale-15M")

prompt = "Once upon a time, a small puppy found a shiny key"
inputs = tokenizer(prompt, return_tensors="pt")

output = model.generate(**inputs, max_length=100, do_sample=True, temperature=0.8)
print(tokenizer.decode(output[0], skip_special_tokens=True))

Downloads last month: 32

Safetensors

Model size

30M params

Tensor type

F32

Model tree for agentbyumer/TinyTale-15M

Base model

openai-community/gpt2

Finetuned

(2181)

this model

Quantizations

1 model

agentbyumer
/

TinyTale-15M

TinyTale-15M

Model Architecture Specifications

Training Hyperparameters

How to Use

Model tree for agentbyumer/TinyTale-15M

Dataset used to train agentbyumer/TinyTale-15M