Instructions to use QuantumLearningMachines/qlm-math-tutor with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use QuantumLearningMachines/qlm-math-tutor with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-3.1-8B-Instruct") model = PeftModel.from_pretrained(base_model, "QuantumLearningMachines/qlm-math-tutor") - Notebooks
- Google Colab
- Kaggle
QLM Socratic Math Tutor
A Llama 3.1 8B Instruct model fine-tuned with LoRA to be a Socratic math tutor for K-12 students. The model never gives answers — it asks guiding questions that help students reason through math problems themselves.
Key Results (Rigorous Evaluation, 95% CI)
| Metric | Score | 95% CI | n |
|---|---|---|---|
| Socratic question rate | 100% | [98%, 100%] | 200 |
| Relevance to specific student error | 74.5% | [68%, 80%] | 200 |
| Answer avoidance rate | 96% | [92%, 98%] | 200 |
| Answer leak rate | 1% | [0.2%, 5.4%] | 100 |
| Grade-appropriate language | 100% | [98%, 100%] | 200 |
All metrics evaluated with heuristic scoring (no LLM-as-judge) under production conditions with mission context, vocabulary hints, and misconception targeting.
How It Works
The model is trained to be Socratic: when a student makes an error, instead of correcting them, it asks a question that helps them discover the error themselves.
Student: "I think 1/3 + 1/4 = 2/7 because I added the tops and bottoms."
Model: "If you had 1/3 of a pizza and 1/4 of the same pizza, would you really have less than 1/3 of a pizza total? Try drawing both fractions on the same circle."
Usage
With PEFT (recommended)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model (requires Llama access)
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
torch_dtype=torch.float16,
device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")
# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "QuantumLearningMachines/qlm-math-tutor")
# Build prompt
system = "You are a Socratic math tutor for grade 6-8 students. Never give the answer. Ask guiding questions. Keep responses to 2-3 sentences."
messages = [
{"role": "system", "content": system},
{"role": "user", "content": "I think 1/3 + 1/4 = 2/7 because I added the tops and bottoms"},
]
input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(input_ids, max_new_tokens=150, temperature=0.7, do_sample=True)
response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)
With 4-bit Quantization (for consumer GPUs)
from transformers import AutoModelForCausalLM, BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
)
base_model = AutoModelForCausalLM.from_pretrained(
"meta-llama/Llama-3.1-8B-Instruct",
quantization_config=quantization_config,
device_map="auto",
)
model = PeftModel.from_pretrained(base_model, "QuantumLearningMachines/qlm-math-tutor")
# Same generation code as above
System Prompt
The model responds to standard Llama chat format with a system prompt instructing Socratic tutoring behavior. A simple system prompt works:
You are a Socratic math tutor. Never give the answer. Ask guiding questions. Keep responses to 2-3 sentences.
Training
Base model: meta-llama/Llama-3.1-8B-Instruct
Method: LoRA
Training data: Synthetic tutoring interactions across K-12 mathematics
Hardware: HuggingFace L4 GPU (24GB)
Training time: ~4 hours
Final loss: 0.306
Limitations
Synthetic training data: The model was trained on synthetic data, not real classroom tutoring transcripts. This limits scaffolding specificity — 28% of responses target the specific error, while 68% ask relevant but generic guiding questions.
Answer leak rate: 1% of responses contain the correct answer (detected by exact numeric matching). An answer-leak filter is deployed in production.
Math only: Trained exclusively on K-12 mathematics. Performance on other STEM subjects is untested.
No longitudinal validation: No classroom outcome data yet. Benchmark results measure response quality, not learning gains.
Heuristic evaluation: All evaluation uses keyword/heuristic scoring, not human expert annotation. Human evaluation with math teachers is planned.
Evaluation Methodology
All metrics use 95% confidence intervals. Tutor model evaluated on n=200 (Socratic quality), n=50 (scaffolding), n=100 (answer leak). No LLM-as-judge — all scoring is heuristic to avoid circularity.
Full benchmark results: quantumlearningmachines.com/research/external-benchmark-results
Part of a Larger System
This tutor model is one component of the QLM platform — an integrated system for adaptive math learning. The model weights are open. The measurement and orchestration systems that train and improve the model are proprietary.
Citation
@misc{qlm-math-tutor-2026,
title={QLM Socratic Math Tutor: An Open-Source Llama 3.1 8B LoRA for K-12 Mathematics},
author={Quantum Learning Machines},
year={2026},
url={https://huggingface.co/QuantumLearningMachines/qlm-math-tutor},
}
Contact
- Try the tutor: quantumlearningmachines.com/try-math-tutor
- Benchmarks: quantumlearningmachines.com/research
- Partnerships: hello@quantumlearningmachines.com
- Downloads last month
- 166
Model tree for QuantumLearningMachines/qlm-math-tutor
Base model
meta-llama/Llama-3.1-8B