Instructions to use razor5050/TinyStories-45M with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use razor5050/TinyStories-45M with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="razor5050/TinyStories-45M")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("razor5050/TinyStories-45M")
model = AutoModelForCausalLM.from_pretrained("razor5050/TinyStories-45M")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use razor5050/TinyStories-45M with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "razor5050/TinyStories-45M"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "razor5050/TinyStories-45M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/razor5050/TinyStories-45M

SGLang

How to use razor5050/TinyStories-45M with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "razor5050/TinyStories-45M" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "razor5050/TinyStories-45M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "razor5050/TinyStories-45M" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "razor5050/TinyStories-45M",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use razor5050/TinyStories-45M with Docker Model Runner:
```
docker model run hf.co/razor5050/TinyStories-45M
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

TinyStories-45M

A 45-million parameter language model trained entirely on the TinyStories dataset for creative story generation. This model follows the LLaMA architecture with grouped query attention (GQA) and is optimized for short-form narrative text.

Model Details

Attribute	Value
Architecture	LLaMA-style (decoder-only transformer)
Parameters	45.46M
Hidden Size	512
Layers	13
Attention Heads	8
KV Heads (GQA)	4
Intermediate Size	1344
Vocab Size	16384
Context Length	512
Tied Embeddings	Yes

Training

Pretraining

Dataset: roneneldan/TinyStories
Epochs: 3
Effective Batch Size: 128
Learning Rate: 5e-4 with cosine decay
Warmup: 1%
Weight Decay: 0.1
Precision: FP16
Optimizer: AdamW

Supervised Fine-Tuning (SFT)

Dataset: roneneldan/TinyStoriesInstruct
Epochs: 1
Learning Rate: 1e-4
Loss Masking: Assistant-only (only compute loss on story completion)

Tokenizer

Type: SentencePiece Unigram
Vocab Size: 16,384
Special Tokens: <pad>, <eos>, <bos>, <unk>, <|im_end|>

Evaluation

Metric	Value
Validation Loss	0.829051066686119
Perplexity	2.2911436557769775

50-Prompt Inference

See evaluation/50_prompts.json for generated story samples.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("razor5050/TinyStories-45M")
tokenizer = AutoTokenizer.from_pretrained("razor5050/TinyStories-45M")

prompt = "Features: a brave cat\nWords: moon, adventure\nSummary: A cat goes on a moon adventure\nStory:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=200, do_sample=True, temperature=0.8)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Hardware

Training GPU: NVIDIA RTX 3060 12GB
Training Time: ~8-10 hours (pretrain + SFT)

Citation

@dataset{roneneldan2023tinystories,
  title={TinyStories: How Small Can Language Models Be and Still Speak Coherent English?},
  author={Ronen Eldan and Yuanzhi Li},
  year={2023}
}

Generated: 2026-05-20 18:37:02

Downloads last month: 482

Safetensors

Model size

45.5M params

Tensor type

F32

Model tree for razor5050/TinyStories-45M

Unable to build the model tree, the base model loops to the model itself. Learn more.

razor5050
/

TinyStories-45M