TinyMathReason-1B-sft
TinyMathReason-1B-sft is a 1.12 Billion parameter Llama-style decoder-only transformer trained from scratch specifically for mathematical reasoning. This is the Supervised Fine-Tuned (SFT) variant.
Model Description
- Developed by: Himanshu Nakrani
- Model type: Decoder-only Transformer
- Language(s): English, Mathematics, Code
- License: Apache 2.0
- Architecture: 22 layers, 2048 hidden dimension, 16 Attention heads, 4 KV heads (GQA), SwiGLU activation (5632 intermediate dim).
- Parameters: 1.12B total
- Context Length: 4096 tokens
Training Details
Pretraining (Base Model)
The base model was trained from a random initialization on Google Cloud TPU v4-32 using the MaxText framework.
- Tokens: ~300 Billion
- Optimizer: AdamW (β1=0.9, β2=0.95, weight_decay=0.1)
- Learning Rate: 3e-4 peak, cosine decay to 3e-5
Supervised Fine-Tuning (SFT)
This variant was trained on ~600k instruction-following mathematical examples formatted in ChatML.
- Hardware: 1x A100 GPU using PyTorch + TRL
- Learning Rate: 2e-5 (Cosine schedule)
- Epochs: 2
Intended Uses & Limitations
Intended Uses:
- Solving step-by-step grade-school to high-school level math problems.
- Educational assistance and logic-based chain-of-thought generation.
- As a foundation for further preference optimization (e.g., DPO, GRPO).
Limitations:
- Being a 1B parameter model, it lacks the broad general knowledge of larger models.
- Prone to arithmetic hallucination on very large numbers.
- May fail on complex topology or advanced undergraduate mathematics.
Citation
@misc{tinymathreason2026,
author = {Himanshu Nakrani},
title = {TinyMathReason-1B: A 1 Billion Parameter Mathematical Reasoning LLM Built from Scratch on TPU v4-32},
year = {2026},
publisher = {GitHub},
howpublished = {\url{https://github.com/himanshu-nakrani/TinyMathReason-1B}}
}