TinyMathReason-1B-sft

TinyMathReason-1B-sft is a 1.12 Billion parameter Llama-style decoder-only transformer trained from scratch specifically for mathematical reasoning. This is the Supervised Fine-Tuned (SFT) variant.

Model Description

Developed by: Himanshu Nakrani
Model type: Decoder-only Transformer
Language(s): English, Mathematics, Code
License: Apache 2.0
Architecture: 22 layers, 2048 hidden dimension, 16 Attention heads, 4 KV heads (GQA), SwiGLU activation (5632 intermediate dim).
Parameters: 1.12B total
Context Length: 4096 tokens

Training Details

Pretraining (Base Model)

The base model was trained from a random initialization on Google Cloud TPU v4-32 using the MaxText framework.

Tokens: ~300 Billion
Optimizer: AdamW (β1=0.9, β2=0.95, weight_decay=0.1)
Learning Rate: 3e-4 peak, cosine decay to 3e-5

Supervised Fine-Tuning (SFT)

This variant was trained on ~600k instruction-following mathematical examples formatted in ChatML.

Hardware: 1x A100 GPU using PyTorch + TRL
Learning Rate: 2e-5 (Cosine schedule)
Epochs: 2

Intended Uses & Limitations

Intended Uses:

Solving step-by-step grade-school to high-school level math problems.
Educational assistance and logic-based chain-of-thought generation.
As a foundation for further preference optimization (e.g., DPO, GRPO).

Limitations:

Being a 1B parameter model, it lacks the broad general knowledge of larger models.
Prone to arithmetic hallucination on very large numbers.
May fail on complex topology or advanced undergraduate mathematics.

Citation

@misc{tinymathreason2026,
  author = {Himanshu Nakrani},
  title = {TinyMathReason-1B: A 1 Billion Parameter Mathematical Reasoning LLM Built from Scratch on TPU v4-32},
  year = {2026},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/himanshu-nakrani/TinyMathReason-1B}}
}

Downloads last month: 723

Safetensors

Model size

1B params

Tensor type

BF16

himanshunakrani9
/

TinyMathReason-1B-sft