Instructions to use MoistPotato/dsa-reasoning-coach-7b-lora with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use MoistPotato/dsa-reasoning-coach-7b-lora with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-7b-instruct-unsloth-bnb-4bit") model = PeftModel.from_pretrained(base_model, "MoistPotato/dsa-reasoning-coach-7b-lora") - Notebooks
- Google Colab
- Kaggle
DSA Reasoning Coach β LoRA adapter (Qwen2.5-7B)
A LoRA adapter that turns Qwen2.5-7B-Instruct into a tutor that teaches you how to derive a Data Structures & Algorithms solution β observations, the bottleneck, the key insight, the pattern β instead of dumping the code.
- Live demo: https://nikhitauppar8--dsa-reasoning-coach-ui.modal.run
- GitHub (full pipeline + eval): https://github.com/Nick-2908/dsa-coach
- Base model:
Qwen/Qwen2.5-7B-Instruct
What it does
Two behaviors are internalized by the fine-tune β they appear even with a minimal system
prompt ("You are a helpful DSA tutor."):
- Answers in a fixed 8-section teaching format: Observations β Brute force β Bottleneck β Key insight β Pattern β Optimized approach β Complexity β Generalizable lesson.
- Refuses to dump runnable code β it teaches the thinking, not the solution.
Evaluation
Held-out 16 problems, no schema in the prompt, greedy decoding, LLM-as-judge
(Cerebras gpt-oss-120b). Base = stock Qwen2.5-7B-Instruct with the same minimal prompt.
| Criterion | Base | Fine-tuned 7B | Ξ |
|---|---|---|---|
| Format adherence (/2) | 0.00 | 1.94 | +1.94 |
| Insight correctness (/2) | 1.56 | 1.56 | +0.00 |
| Complexity correct (/2) | 0.44 | 1.50 | +1.06 |
| Answer not leaked (/1) | 0.00 | 1.00 | +1.00 |
| TOTAL (/7) | 2.00 | 6.00 | +4.00 |
With no schema in the prompt, the base model free-forms and dumps full code on all 16 problems; the fine-tune emits the teaching format and refuses code β a 3Γ total-score win, holding insight parity.
How it was trained
- QLoRA (Unsloth) on a free Colab T4, LoRA r=16, 3 epochs, 73 distilled examples.
- Distillation: training derivations generated by a frontier model, structurally filtered and human-reviewed for algorithm correctness.
- Key trick β prompt augmentation: the system prompt was rotated per example over
{full schema, minimal, none}so the behavior binds to the task, not the instruction text. This is what makes the format + no-code-leak survive a minimal prompt at inference.
Usage
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "Qwen/Qwen2.5-7B-Instruct"
ADAPTER = "MoistPotato/dsa-reasoning-coach-7b-lora"
tok = AutoTokenizer.from_pretrained(BASE)
model = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16).to("cuda")
model = PeftModel.from_pretrained(model, ADAPTER)
model = model.merge_and_unload() # optional: faster inference
model.eval()
messages = [
{"role": "system", "content": "You are a helpful DSA tutor."}, # minimal prompt on purpose
{"role": "user", "content": "Given a string, find the length of the longest substring "
"without repeating characters."},
]
inputs = tok.apply_chat_template(messages, add_generation_prompt=True,
return_tensors="pt", return_dict=True).to("cuda")
out = model.generate(**inputs, max_new_tokens=1024, do_sample=False)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Limitations
- Small held-out test set (16 problems) β results are directional.
- The win is behavioral (format, no-code-leak, complexity) plus insight parity, not a claim of out-reasoning frontier models on hard problems.
- English only; targets classic interview-style DSA problems across ~16 patterns.
License
Apache-2.0 (matches the base model).
- Downloads last month
- 80