language: - en license: apache-2.0 tags: - text-generation - math - distilgpt2 - openwebmath - arithmetic datasets: - openwebmath metrics: - exact_match pipeline_tag: text-generation
DistilGPT2-Math
Model Description
deadMarkov/distilgpt2-math is a lightweight, causal language model specifically fine-tuned for mathematical reasoning and arithmetic. It is built upon the foundational DistilGPT2 architecture but includes structural modifications and targeted training to enhance its numerical and mathematical logic capabilities.
Key Modifications
- Modified Vocabulary: The tokenizer and model vocabulary were uniquely modified and optimized to focus heavily on mathematical symbols, notation, and reasoning-specific tokens.
- Supervised Fine-Tuning (SFT): The model was fine-tuned extensively on a mixture of OpenWebMath and highly targeted synthetic arithmetic data to reinforce step-by-step mathematical problem-solving.
Model Details
- Model Type: Causal Language Model
- Base Architecture: DistilGPT2 (~82M parameters)
- Language(s): English, Mathematics
- License: Apache 2.0
Evaluation Results
The model was evaluated using the EleutherAI Language Model Evaluation Harness (v0.4.11).
GSM8K (Grade School Math)
Performance on the GSM8K dataset using a 5-shot prompt setting (greedy decoding, temperature = 0.0):
| Task | Metric | Shots | Score |
|---|---|---|---|
| GSM8K | Exact Match (Flexible Extract) | 5 | 0.99% (0.0098) |
| GSM8K | Exact Match (Strict Match) | 5 | 0.00% (0.0000) |
Evaluation run details:
- Framework:
lm-evaluation-harness - Model Precision:
torch.float32 - Device: CUDA (RTX A4000)
How to Get Started with the Model
You can use this model directly with the Hugging Face transformers library:
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "deadMarkov/distilgpt2-math"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id)
prompt = "Question: If I have 5 apples and buy 7 more, how many do I have?\nAnswer:"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=50)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Intended Uses & Limitations
Intended Uses
- Researching mathematical capability scaling in very small, distilled language models.
- Experimenting with specialized tokenization/vocabularies for arithmetic tasks.
- Serving as a lightweight baseline for small-scale math SFT experiments.
Limitations
- As an 82-parameter model, its mathematical reasoning capacity is heavily constrained compared to larger (7B+) models. It may struggle with complex, multi-step word problems or advanced calculus.
- The model may occasionally hallucinate numbers or lose track of arithmetic operations over long context windows.
- Downloads last month
- 124
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support