TinyMathReason-1B-sft

TinyMathReason-1B-sft is a 1.12 Billion parameter Llama-style decoder-only transformer trained from scratch specifically for mathematical reasoning. This is the Supervised Fine-Tuned (SFT) variant.

Model Description

  • Developed by: Himanshu Nakrani
  • Model type: Decoder-only Transformer
  • Language(s): English, Mathematics, Code
  • License: Apache 2.0
  • Architecture: 22 layers, 2048 hidden dimension, 16 Attention heads, 4 KV heads (GQA), SwiGLU activation (5632 intermediate dim).
  • Parameters: 1.12B total
  • Context Length: 4096 tokens

Training Details

Pretraining (Base Model)

The base model was trained from a random initialization on Google Cloud TPU v4-32 using the MaxText framework.

  • Tokens: ~300 Billion
  • Optimizer: AdamW (β1=0.9, β2=0.95, weight_decay=0.1)
  • Learning Rate: 3e-4 peak, cosine decay to 3e-5

Supervised Fine-Tuning (SFT)

This variant was trained on ~600k instruction-following mathematical examples formatted in ChatML.

  • Hardware: 1x A100 GPU using PyTorch + TRL
  • Learning Rate: 2e-5 (Cosine schedule)
  • Epochs: 2

Intended Uses & Limitations

Intended Uses:

  • Solving step-by-step grade-school to high-school level math problems.
  • Educational assistance and logic-based chain-of-thought generation.
  • As a foundation for further preference optimization (e.g., DPO, GRPO).

Limitations:

  • Being a 1B parameter model, it lacks the broad general knowledge of larger models.
  • Prone to arithmetic hallucination on very large numbers.
  • May fail on complex topology or advanced undergraduate mathematics.

Citation

@misc{tinymathreason2026,
  author = {Himanshu Nakrani},
  title = {TinyMathReason-1B: A 1 Billion Parameter Mathematical Reasoning LLM Built from Scratch on TPU v4-32},
  year = {2026},
  publisher = {GitHub},
  howpublished = {\url{https://github.com/himanshu-nakrani/TinyMathReason-1B}}
}
Downloads last month
723
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Datasets used to train himanshunakrani9/TinyMathReason-1B-sft