Instructions to use kennethp97/sft-arm-a-1p5b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use kennethp97/sft-arm-a-1p5b with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-1.5B-Instruct") model = PeftModel.from_pretrained(base_model, "kennethp97/sft-arm-a-1p5b") - Notebooks
- Google Colab
- Kaggle
Phase-2 Arm-A SFT -- LoRA adapter on Qwen2.5-1.5B-Instruct
A LoRA adapter on Qwen/Qwen2.5-1.5B-Instruct, trained with plain supervised
fine-tuning on the train_registry v0.4.0 procedural-compliance corpus.
Each training row is one half of a flip or anchor pair: a short reasoning that
cites the deciding clause, ending in FINAL ANSWER: <compliant|non-compliant>.
What it does
Given a procedure and a scenario, the model emits an EDGE CHECKS: reasoning
block followed by a FINAL ANSWER: compliant|non-compliant line. The recipe
targets the free-form regime; gains concentrate there.
Headline eval (frozen 233-process held-out; 128 flip + 122 anchor; greedy / T=0)
| regime | flip rate (base -> SFT) | anchor acc (base -> SFT) | plain (base -> SFT) |
|---|---|---|---|
| forced | 0.117 -> 0.188 | 0.557 -> 0.582 | 0.570 -> 0.608 |
| free-form | 0.219 -> 0.469 (+25.0pp) | 0.467 -> 0.664 (+19.7pp) | 0.576 -> 0.726 |
The lift is free-form-only (the regime the reasoning recipe targets); the gains concentrate on exception / hierarchy / threshold handles, while step-ordering stays flat (0.200 -> 0.225) -- the known structural bottleneck.
How to use
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
BASE = "Qwen/Qwen2.5-1.5B-Instruct"
ADAPTER = "kennethp97/sft-arm-a-1p5b"
tok = AutoTokenizer.from_pretrained(BASE, use_fast=True)
tok.pad_token = tok.pad_token or tok.eos_token
base = AutoModelForCausalLM.from_pretrained(BASE, torch_dtype=torch.bfloat16,
device_map="auto")
model = PeftModel.from_pretrained(base, ADAPTER)
model.eval()
Prompt format and a worked side-by-side eval against the base and the companion
DPO adapter (kennethp97/dpo-flip-1p5b) are in the combined eval notebook.
Training summary
- Base:
Qwen/Qwen2.5-1.5B-Instruct - LoRA r=32 alpha=64 on q/k/v/o/gate/up/down
- Plain SFT (cross-entropy on the chosen completion), full bf16
- Training set: 3,734 rows (after filtering 1,226 placeholder-
verifier_reasonrows from the 5,020-row v0.4.0 corpus)
Limitations
- Research checkpoint, not a production classifier. Below the pre-registered absolute GO bar.
- Step-ordering bottleneck. Ordering flip stays nearly flat.
- Free-form is the target regime. Forced-verdict gains are small.
- Format sensitivity. Trained on the
EDGE CHECKS ... FINAL ANSWERformat above; deviation may degrade performance. Greedy (T=0) matches the reported numbers.
License
Adapter: Apache-2.0. Base model: under the Qwen2.5-1.5B-Instruct license.
- Downloads last month
- 16