You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

QLM Socratic Math Tutor

A Llama 3.1 8B Instruct model fine-tuned with LoRA to be a Socratic math tutor for K-12 students. The model never gives answers — it asks guiding questions that help students reason through math problems themselves.

Key Results (Rigorous Evaluation, 95% CI)

Metric Score 95% CI n
Socratic question rate 100% [98%, 100%] 200
Relevance to specific student error 74.5% [68%, 80%] 200
Answer avoidance rate 96% [92%, 98%] 200
Answer leak rate 1% [0.2%, 5.4%] 100
Grade-appropriate language 100% [98%, 100%] 200

All metrics evaluated with heuristic scoring (no LLM-as-judge) under production conditions with mission context, vocabulary hints, and misconception targeting.

How It Works

The model is trained to be Socratic: when a student makes an error, instead of correcting them, it asks a question that helps them discover the error themselves.

Student: "I think 1/3 + 1/4 = 2/7 because I added the tops and bottoms."

Model: "If you had 1/3 of a pizza and 1/4 of the same pizza, would you really have less than 1/3 of a pizza total? Try drawing both fractions on the same circle."

Usage

With PEFT (recommended)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model (requires Llama access)
base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    torch_dtype=torch.float16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.1-8B-Instruct")

# Load LoRA adapter
model = PeftModel.from_pretrained(base_model, "QuantumLearningMachines/qlm-math-tutor")

# Build prompt
system = "You are a Socratic math tutor for grade 6-8 students. Never give the answer. Ask guiding questions. Keep responses to 2-3 sentences."

messages = [
    {"role": "system", "content": system},
    {"role": "user", "content": "I think 1/3 + 1/4 = 2/7 because I added the tops and bottoms"},
]

input_ids = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(input_ids, max_new_tokens=150, temperature=0.7, do_sample=True)

response = tokenizer.decode(output[0][input_ids.shape[1]:], skip_special_tokens=True)
print(response)

With 4-bit Quantization (for consumer GPUs)

from transformers import AutoModelForCausalLM, BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
)

base_model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.1-8B-Instruct",
    quantization_config=quantization_config,
    device_map="auto",
)

model = PeftModel.from_pretrained(base_model, "QuantumLearningMachines/qlm-math-tutor")
# Same generation code as above

System Prompt

The model responds to standard Llama chat format with a system prompt instructing Socratic tutoring behavior. A simple system prompt works:

You are a Socratic math tutor. Never give the answer. Ask guiding questions. Keep responses to 2-3 sentences.

Training

  • Base model: meta-llama/Llama-3.1-8B-Instruct

  • Method: LoRA

  • Training data: Synthetic tutoring interactions across K-12 mathematics

  • Hardware: HuggingFace L4 GPU (24GB)

  • Training time: ~4 hours

  • Final loss: 0.306

Limitations

  1. Synthetic training data: The model was trained on synthetic data, not real classroom tutoring transcripts. This limits scaffolding specificity — 28% of responses target the specific error, while 68% ask relevant but generic guiding questions.

  2. Answer leak rate: 1% of responses contain the correct answer (detected by exact numeric matching). An answer-leak filter is deployed in production.

  3. Math only: Trained exclusively on K-12 mathematics. Performance on other STEM subjects is untested.

  4. No longitudinal validation: No classroom outcome data yet. Benchmark results measure response quality, not learning gains.

  5. Heuristic evaluation: All evaluation uses keyword/heuristic scoring, not human expert annotation. Human evaluation with math teachers is planned.

Evaluation Methodology

All metrics use 95% confidence intervals. Tutor model evaluated on n=200 (Socratic quality), n=50 (scaffolding), n=100 (answer leak). No LLM-as-judge — all scoring is heuristic to avoid circularity.

Full benchmark results: quantumlearningmachines.com/research/external-benchmark-results

Part of a Larger System

This tutor model is one component of the QLM platform — an integrated system for adaptive math learning. The model weights are open. The measurement and orchestration systems that train and improve the model are proprietary.

Citation

@misc{qlm-math-tutor-2026,
  title={QLM Socratic Math Tutor: An Open-Source Llama 3.1 8B LoRA for K-12 Mathematics},
  author={Quantum Learning Machines},
  year={2026},
  url={https://huggingface.co/QuantumLearningMachines/qlm-math-tutor},
}

Contact

Downloads last month
166
Safetensors
Model size
8B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for QuantumLearningMachines/qlm-math-tutor

Adapter
(2296)
this model