language: - en license: apache-2.0 tags: - text-generation - math - distilgpt2 - openwebmath - arithmetic datasets: - openwebmath metrics: - exact_match pipeline_tag: text-generation

DistilGPT2-Math

Model Description

deadMarkov/distilgpt2-math is a lightweight, causal language model specifically fine-tuned for mathematical reasoning and arithmetic. It is built upon the foundational DistilGPT2 architecture but includes structural modifications and targeted training to enhance its numerical and mathematical logic capabilities.

Key Modifications

  • Modified Vocabulary: The tokenizer and model vocabulary were uniquely modified and optimized to focus heavily on mathematical symbols, notation, and reasoning-specific tokens.
  • Supervised Fine-Tuning (SFT): The model was fine-tuned extensively on a mixture of OpenWebMath and highly targeted synthetic arithmetic data to reinforce step-by-step mathematical problem-solving.

Model Details

  • Model Type: Causal Language Model
  • Base Architecture: DistilGPT2 (~82M parameters)
  • Language(s): English, Mathematics
  • License: Apache 2.0

Evaluation Results

The model was evaluated using the EleutherAI Language Model Evaluation Harness (v0.4.11).

GSM8K (Grade School Math)

Performance on the GSM8K dataset using a 5-shot prompt setting (greedy decoding, temperature = 0.0):

Task Metric Shots Score
GSM8K Exact Match (Flexible Extract) 5 0.99% (0.0098)
GSM8K Exact Match (Strict Match) 5 0.00% (0.0000)

Evaluation run details:

  • Framework: lm-evaluation-harness
  • Model Precision: torch.float32
  • Device: CUDA (RTX A4000)

How to Get Started with the Model

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "deadMarkov/distilgpt2-math"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "Question: If I have 5 apples and buy 7 more, how many do I have?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Uses & Limitations

Intended Uses

  • Researching mathematical capability scaling in very small, distilled language models.
  • Experimenting with specialized tokenization/vocabularies for arithmetic tasks.
  • Serving as a lightweight baseline for small-scale math SFT experiments.

Limitations

  • As an 82-parameter model, its mathematical reasoning capacity is heavily constrained compared to larger (7B+) models. It may struggle with complex, multi-step word problems or advanced calculus.
  • The model may occasionally hallucinate numbers or lose track of arithmetic operations over long context windows.
Downloads last month
124
Safetensors
Model size
81.9M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support