DSA Reasoning Coach β€” LoRA adapter (Qwen2.5-7B)

A LoRA adapter that turns Qwen2.5-7B-Instruct into a tutor that teaches you how to derive a Data Structures & Algorithms solution β€” observations, the bottleneck, the key insight, the pattern β€” instead of dumping the code.

What it does

Two behaviors are internalized by the fine-tune β€” they appear even with a minimal system prompt ("You are a helpful DSA tutor."):

  1. Answers in a fixed 8-section teaching format: Observations β†’ Brute force β†’ Bottleneck β†’ Key insight β†’ Pattern β†’ Optimized approach β†’ Complexity β†’ Generalizable lesson.
  2. Refuses to dump runnable code β€” it teaches the thinking, not the solution.

Evaluation

Held-out 16 problems, no schema in the prompt, greedy decoding, LLM-as-judge (Cerebras gpt-oss-120b). Base = stock Qwen2.5-7B-Instruct with the same minimal prompt.

Criterion Base Fine-tuned 7B Ξ”
Format adherence (/2) 0.00 1.94 +1.94
Insight correctness (/2) 1.56 1.56 +0.00
Complexity correct (/2) 0.44 1.50 +1.06
Answer not leaked (/1) 0.00 1.00 +1.00
TOTAL (/7) 2.00 6.00 +4.00

With no schema in the prompt, the base model free-forms and dumps full code on all 16 problems; the fine-tune emits the teaching format and refuses code β€” a 3Γ— total-score win, holding insight parity.

How it was trained

  • QLoRA (Unsloth) on a free Colab T4, LoRA r=16, 3 epochs, 73 distilled examples.
  • Distillation: training derivations generated by a frontier model, structurally filtered and human-reviewed for algorithm correctness.
  • Key trick β€” prompt augmentation: the system prompt was rotated per example over {full schema, minimal, none} so the behavior binds to the task, not the instruction text. This is what makes the format + no-code-leak survive a minimal prompt at inference.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE = "Qwen/Qwen2.5-7B-Instruct"
ADAPTER = "MoistPotato/dsa-reasoning-coach-7b-lora"

tok = AutoTokenizer.from_pretrained(BASE)
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16).to("cuda")
model = PeftModel.from_pretrained(model, ADAPTER)
model = model.merge_and_unload()   # optional: faster inference
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful DSA tutor."},   # minimal prompt on purpose
    {"role": "user",   "content": "Given a string, find the length of the longest substring "
                                  "without repeating characters."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                 return_tensors="pt", return_dict=True).to("cuda")
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Limitations

  • Small held-out test set (16 problems) β€” results are directional.
  • The win is behavioral (format, no-code-leak, complexity) plus insight parity, not a claim of out-reasoning frontier models on hard problems.
  • English only; targets classic interview-style DSA problems across ~16 patterns.

License

Apache-2.0 (matches the base model).

Downloads last month
80
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for MoistPotato/dsa-reasoning-coach-7b-lora

Base model

Qwen/Qwen2.5-7B
Adapter
(2252)
this model