language: - en license: apache-2.0 tags: - text-generation - math - distilgpt2 - openwebmath - arithmetic datasets: - openwebmath metrics: - exact_match pipeline_tag: text-generation

DistilGPT2-Math

Model Description

deadMarkov/distilgpt2-math is a lightweight, causal language model specifically fine-tuned for mathematical reasoning and arithmetic. It is built upon the foundational DistilGPT2 architecture but includes structural modifications and targeted training to enhance its numerical and mathematical logic capabilities.

Key Modifications

Modified Vocabulary: The tokenizer and model vocabulary were uniquely modified and optimized to focus heavily on mathematical symbols, notation, and reasoning-specific tokens.
Supervised Fine-Tuning (SFT): The model was fine-tuned extensively on a mixture of OpenWebMath and highly targeted synthetic arithmetic data to reinforce step-by-step mathematical problem-solving.

Model Details

Model Type: Causal Language Model
Base Architecture: DistilGPT2 (~82M parameters)
Language(s): English, Mathematics
License: Apache 2.0

Evaluation Results

The model was evaluated using the EleutherAI Language Model Evaluation Harness (v0.4.11).

GSM8K (Grade School Math)

Performance on the GSM8K dataset using a 5-shot prompt setting (greedy decoding, temperature = 0.0):

Task	Metric	Shots	Score
GSM8K	Exact Match (Flexible Extract)	5	0.99% (0.0098)
GSM8K	Exact Match (Strict Match)	5	0.00% (0.0000)

Evaluation run details:

Framework: lm-evaluation-harness
Model Precision: torch.float32
Device: CUDA (RTX A4000)

How to Get Started with the Model

You can use this model directly with the Hugging Face transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "deadMarkov/distilgpt2-math"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)

prompt = "Question: If I have 5 apples and buy 7 more, how many do I have?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Intended Uses & Limitations

Intended Uses

Researching mathematical capability scaling in very small, distilled language models.
Experimenting with specialized tokenization/vocabularies for arithmetic tasks.
Serving as a lightweight baseline for small-scale math SFT experiments.

Limitations

As an 82-parameter model, its mathematical reasoning capacity is heavily constrained compared to larger (7B+) models. It may struggle with complex, multi-step word problems or advanced calculus.
The model may occasionally hallucinate numbers or lose track of arithmetic operations over long context windows.

Downloads last month: 124

Safetensors

Model size

81.9M params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support