DSA Reasoning Coach — LoRA adapter (Qwen2.5-7B)

A LoRA adapter that turns Qwen2.5-7B-Instruct into a tutor that teaches you how to derive a Data Structures & Algorithms solution — observations, the bottleneck, the key insight, the pattern — instead of dumping the code.

Live demo: https://nikhitauppar8--dsa-reasoning-coach-ui.modal.run
GitHub (full pipeline + eval): https://github.com/Nick-2908/dsa-coach
Base model: Qwen/Qwen2.5-7B-Instruct

What it does

Two behaviors are internalized by the fine-tune — they appear even with a minimal system prompt ("You are a helpful DSA tutor."):

Answers in a fixed 8-section teaching format: Observations → Brute force → Bottleneck → Key insight → Pattern → Optimized approach → Complexity → Generalizable lesson.
Refuses to dump runnable code — it teaches the thinking, not the solution.

Evaluation

Held-out 16 problems, no schema in the prompt, greedy decoding, LLM-as-judge (Cerebras gpt-oss-120b). Base = stock Qwen2.5-7B-Instruct with the same minimal prompt.

Criterion	Base	Fine-tuned 7B	Δ
Format adherence (/2)	0.00	1.94	+1.94
Insight correctness (/2)	1.56	1.56	+0.00
Complexity correct (/2)	0.44	1.50	+1.06
Answer not leaked (/1)	0.00	1.00	+1.00
TOTAL (/7)	2.00	6.00	+4.00

With no schema in the prompt, the base model free-forms and dumps full code on all 16 problems; the fine-tune emits the teaching format and refuses code — a 3× total-score win, holding insight parity.

How it was trained

QLoRA (Unsloth) on a free Colab T4, LoRA r=16, 3 epochs, 73 distilled examples.
Distillation: training derivations generated by a frontier model, structurally filtered and human-reviewed for algorithm correctness.
Key trick — prompt augmentation: the system prompt was rotated per example over {full schema, minimal, none} so the behavior binds to the task, not the instruction text. This is what makes the format + no-code-leak survive a minimal prompt at inference.

Usage

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

BASE = "Qwen/Qwen2.5-7B-Instruct"
ADAPTER = "MoistPotato/dsa-reasoning-coach-7b-lora"

tok = AutoTokenizer.from_pretrained(BASE)
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16).to("cuda")
model = PeftModel.from_pretrained(model, ADAPTER)
model = model.merge_and_unload()   # optional: faster inference
model.eval()

messages = [
    {"role": "system", "content": "You are a helpful DSA tutor."},   # minimal prompt on purpose
    {"role": "user",   "content": "Given a string, find the length of the longest substring "
                                  "without repeating characters."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
                                 return_tensors="pt", return_dict=True).to("cuda")
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Limitations

Small held-out test set (16 problems) — results are directional.
The win is behavioral (format, no-code-leak, complexity) plus insight parity, not a claim of out-reasoning frontier models on hard problems.
English only; targets classic interview-style DSA problems across ~16 patterns.

License

Apache-2.0 (matches the base model).

Downloads last month: 80

Model tree for MoistPotato/dsa-reasoning-coach-7b-lora

Base model

Qwen/Qwen2.5-7B

Finetuned

Qwen/Qwen2.5-7B-Instruct

Adapter

(2252)

this model