PhyChip-SmolLM3-3B-base-SFT

LoRA adapter for PhyChip - a study of using ngspice as a verifiable RL reward to train a 1-3B language model to design analog circuits from natural-language specs.

Base model: HuggingFaceTB/SmolLM3-3B-Base
Stage: Supervised fine-tuning (SFT) of the base model. SFT teacher: codex / gpt-5.5.
Role in the study: Shows SFT lifts the base model from 0 to 16/40 - SFT is the dominant lever.

Evaluation (pass@1, greedy, ngspice + 23 spec harnesses)

benchmark	pass@1
phy-chip-bench-v1 (40)	16/40 (40.0%)
phy-chip-bench-v2 (50, novel circuits)	0/50 (0%)
AnalogCoder (24 textbook)	20/24 (83.3%)

Adapter: LoRA (PEFT), rank r=16, alpha=32.
Eval protocol: pass@1, greedy decoding (temperature=0); each generation simulated in ngspice and scored by 23 deterministic spec-check harnesses (the same code for every model).
Decontamination: AnalogCoder is eval-only (never in training data). phy-chip-bench-v2 was built with an 8-gram overlap gate (block < 0.40; achieved max overlap 0.018).

How to load

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B-Base", torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, "NithinReddyG/PhyChip-SmolLM3-3B-base-SFT")
tok  = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B-Base")

The base-vs-instruct finding (why this model exists)

This adapter is one of six in a controlled base-vs-instruct study for PhyChip (analog-circuit design with ngspice as a verifiable RL reward). The same SFT/RL recipe was applied starting from SmolLM3-3B-Base and from SmolLM3-3B (instruct):

start	+ SFT (bench_v1)	+ SFT + GRPO (AnalogCoder)
base	0 -> 16/40 (SFT lifts it)	22/24 (RL generalizes)
instruct	0 -> 0/40 (SFT collapses it)	0/24 (RL cannot recover)

SFT on the instruct model degenerated into syntactically valid but device-less SPICE (resistor-salad, no active devices) - catastrophic forgetting on an extreme out-of-distribution domain. RL could not climb out (verifiable reward stuck ~0.05). Conclusion: fine-tune the base model, not the instruct model. This confirms the canonical base -> SFT -> RL ordering with a direct before/after measurement.

License

CC-BY-NC-SA-4.0 (research / educational use). The adapter inherits PhyChip's project license posture; the base model carries its own license.

Downloads last month: 24

Model tree for NithinReddyG/PhyChip-SmolLM3-3B-base-SFT

Base model

HuggingFaceTB/SmolLM3-3B-Base

Adapter

(23)

this model