GCSE English Tutor (Gemma-4-26B-A4B MoE)

A fine-tune of unsloth/gemma-4-26b-a4b-it based on Gemma 4 26B A4B that emulates a UK GCSE English Literature & Language tutor: reasons explicitly before answering, models AQA/Edexcel marking conventions, and gives constructive feedback on student essays.

Fine-tuned from unsloth/gemma-4-26b-a4b-it using Unsloth + TRL on a single NVIDIA DGX Spark GB10 (121 GB unified memory).

Intended use

A study-companion for GCSE English students: explaining literary techniques, walking through poem/extract analysis, modelling PEEL/PETAL paragraphs, and giving formative feedback on student writing. Designed to be served on-device or via llama-swap.

Sample prompt

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

repo = "RichWoollcott/gcse-tutor-gemma4-26b-moe"
tok = AutoTokenizer.from_pretrained(repo)
model = AutoModelForCausalLM.from_pretrained(
    repo, torch_dtype=torch.bfloat16, device_map="cuda"
)

messages = [{"role": "user", "content": 'Can you help me write a PEEL paragraph analysing how Stevenson uses setting to convey duality in Dr Jekyll and Mr Hyde?'}]
inputs = tok.apply_chat_template(
    messages, return_tensors="pt", add_generation_prompt=True
).to("cuda")
out = model.generate(inputs, max_new_tokens=600, do_sample=False)
print(tok.decode(out[0][inputs.shape[1]:], skip_special_tokens=False))

The model is trained to open every response with a <think>...</think> reasoning block followed by the visible answer. If you do not see the tags, add Begin every response with <think>...</think> to the system prompt.

Limitations

Trained on synthetic dialogues; the tutor persona is consistent but does not have access to specific exam-board mark schemes at inference time โ€” pair with a RAG layer over current AQA / Edexcel specifications if used in a high-stakes setting. The model can hallucinate quotes from set texts; verify any direct quotation.

Training data

Multi-turn tutoring dialogues generated by the agentic-dataset-factory Player-Coach pipeline, covering set-text analysis, language techniques, and student-essay feedback for the UK GCSE English curriculum.

The data is not released alongside the weights pending a license review of source material referenced during generation. The generator (agentic-dataset-factory, Player-Coach adversarial pipeline) is open-source and reproducible.

Training procedure

  • Framework: Unsloth + TRL (SFT)
  • Precision: bfloat16
  • PEFT: LoRA, rank 16
  • Optimiser: AdamW (Unsloth defaults)
  • Epochs: 1, effective batch size 4, max-seq-length 2048
  • Hardware: NVIDIA DGX Spark GB10 (single device, unified memory)
  • Container: nvcr.io/nvidia/pytorch:25.11-py3
  • Verified-compatible pins: transformers==5.5.4, accelerate==1.10.0, trl==0.26.1, datasets==4.3.0, latest unsloth/unsloth_zoo/bitsandbytes.

Loss trajectory and reproduction recipe: see the source runbook in appmilla/agentic-dataset-factory โ€” domains/gcse-tutor/RUNBOOK-gcse-fine-tune.md.

Artefacts in this repo

  • / (root) โ€” merged-16bit weights (full standalone model, ~49 GB). Load with AutoModelForCausalLM.from_pretrained(repo_id).
  • lora-adapter/ โ€” the LoRA adapter (~1.9 GB). Apply on top of the base model with PeftModel.from_pretrained(base, repo_id, subfolder="lora-adapter").

GGUF quantisations (q4_k_m, BF16) live in a separate repo: RichWoollcott/gcse-tutor-gemma4-26b-moe-GGUF.

License

Released under the Gemma Terms of Use, as required by the base model.

Citation

If you use this model, please cite the base model, TRL, and Unsloth:

@misc{vonwerra2022trl,
  title  = {TRL: Transformer Reinforcement Learning},
  author = {Leandro von Werra and others},
  year   = 2020,
  url    = {https://github.com/huggingface/trl}
}
Downloads last month
4
Safetensors
Model size
27B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for RichWoollcott/studytutor-gcse-26b-moe

Adapters
1 model