You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

QLM Socratic Math Tutor

A Llama 3.1 8B Instruct model fine-tuned with LoRA to be a Socratic math tutor for K-12 students. The model never gives answers — it asks guiding questions that help students reason through math problems themselves.

Key Results (Rigorous Evaluation, 95% CI)

Metric	Score	95% CI	n
Socratic question rate	100%	[98%, 100%]	200
Relevance to specific student error	74.5%	[68%, 80%]	200
Answer avoidance rate	96%	[92%, 98%]	200
Answer leak rate	1%	[0.2%, 5.4%]	100
Grade-appropriate language	100%	[98%, 100%]	200

All metrics evaluated with heuristic scoring (no LLM-as-judge) under production conditions with mission context, vocabulary hints, and misconception targeting.

How It Works

The model is trained to be Socratic: when a student makes an error, instead of correcting them, it asks a question that helps them discover the error themselves.

Student: "I think 1/3 + 1/4 = 2/7 because I added the tops and bottoms."

Model: "If you had 1/3 of a pizza and 1/4 of the same pizza, would you really have less than 1/3 of a pizza total? Try drawing both fractions on the same circle."

Usage

With PEFT (recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model (requires Llama access)
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "QuantumLearningMachines/qlm-math-tutor")

# Build prompt
system = "You are a Socratic math tutor for grade 6-8 students. Never give the answer. Ask guiding questions. Keep responses to 2-3 sentences."

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": "I think 1/3 + 1/4 = 2/7 because I added the tops and bottoms"},
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(input_ids, max_new_tokens=150, temperature=0.7, do_sample=True)

response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)

With 4-bit Quantization (for consumer GPUs)

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    quantization_config=quantization_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, "QuantumLearningMachines/qlm-math-tutor")
# Same generation code as above

System Prompt

The model responds to standard Llama chat format with a system prompt instructing Socratic tutoring behavior. A simple system prompt works:

You are a Socratic math tutor. Never give the answer. Ask guiding questions. Keep responses to 2-3 sentences.

Training

Base model: meta-llama/Llama-3.1-8B-Instruct
Method: LoRA
Training data: Synthetic tutoring interactions across K-12 mathematics
Hardware: HuggingFace L4 GPU (24GB)
Training time: ~4 hours
Final loss: 0.306

Limitations

Synthetic training data: The model was trained on synthetic data, not real classroom tutoring transcripts. This limits scaffolding specificity — 28% of responses target the specific error, while 68% ask relevant but generic guiding questions.
Answer leak rate: 1% of responses contain the correct answer (detected by exact numeric matching). An answer-leak filter is deployed in production.
Math only: Trained exclusively on K-12 mathematics. Performance on other STEM subjects is untested.
No longitudinal validation: No classroom outcome data yet. Benchmark results measure response quality, not learning gains.
Heuristic evaluation: All evaluation uses keyword/heuristic scoring, not human expert annotation. Human evaluation with math teachers is planned.

Evaluation Methodology

All metrics use 95% confidence intervals. Tutor model evaluated on n=200 (Socratic quality), n=50 (scaffolding), n=100 (answer leak). No LLM-as-judge — all scoring is heuristic to avoid circularity.

Full benchmark results: quantumlearningmachines.com/research/external-benchmark-results

Part of a Larger System

This tutor model is one component of the QLM platform — an integrated system for adaptive math learning. The model weights are open. The measurement and orchestration systems that train and improve the model are proprietary.

Citation

@misc{qlm-math-tutor-2026,
  title={QLM Socratic Math Tutor: An Open-Source Llama 3.1 8B LoRA for K-12 Mathematics},
  author={Quantum Learning Machines},
  year={2026},
  url={https://huggingface.co/QuantumLearningMachines/qlm-math-tutor},
}

Contact

Try the tutor: quantumlearningmachines.com/try-math-tutor
Benchmarks: quantumlearningmachines.com/research
Partnerships: hello@quantumlearningmachines.com

Downloads last month: 166

Safetensors

Model size

8B params

Tensor type

F16

Model tree for QuantumLearningMachines/qlm-math-tutor

Base model

meta-llama/Llama-3.1-8B

Finetuned

meta-llama/Llama-3.1-8B-Instruct

Adapter

(2296)

this model