PhyChip-SmolLM3-3B-base-SFT

LoRA adapter for PhyChip - a study of using ngspice as a verifiable RL reward to train a 1-3B language model to design analog circuits from natural-language specs.

  • Base model: HuggingFaceTB/SmolLM3-3B-Base
  • Stage: Supervised fine-tuning (SFT) of the base model. SFT teacher: codex / gpt-5.5.
  • Role in the study: Shows SFT lifts the base model from 0 to 16/40 - SFT is the dominant lever.

Evaluation (pass@1, greedy, ngspice + 23 spec harnesses)

benchmark pass@1
phy-chip-bench-v1 (40) 16/40 (40.0%)
phy-chip-bench-v2 (50, novel circuits) 0/50 (0%)
AnalogCoder (24 textbook) 20/24 (83.3%)
  • Adapter: LoRA (PEFT), rank r=16, alpha=32.
  • Eval protocol: pass@1, greedy decoding (temperature=0); each generation simulated in ngspice and scored by 23 deterministic spec-check harnesses (the same code for every model).
  • Decontamination: AnalogCoder is eval-only (never in training data). phy-chip-bench-v2 was built with an 8-gram overlap gate (block < 0.40; achieved max overlap 0.018).

How to load

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B-Base", torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, "NithinReddyG/PhyChip-SmolLM3-3B-base-SFT")
tok  = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B-Base")

The base-vs-instruct finding (why this model exists)

This adapter is one of six in a controlled base-vs-instruct study for PhyChip (analog-circuit design with ngspice as a verifiable RL reward). The same SFT/RL recipe was applied starting from SmolLM3-3B-Base and from SmolLM3-3B (instruct):

start + SFT (bench_v1) + SFT + GRPO (AnalogCoder)
base 0 -> 16/40 (SFT lifts it) 22/24 (RL generalizes)
instruct 0 -> 0/40 (SFT collapses it) 0/24 (RL cannot recover)

SFT on the instruct model degenerated into syntactically valid but device-less SPICE (resistor-salad, no active devices) - catastrophic forgetting on an extreme out-of-distribution domain. RL could not climb out (verifiable reward stuck ~0.05). Conclusion: fine-tune the base model, not the instruct model. This confirms the canonical base -> SFT -> RL ordering with a direct before/after measurement.

License

CC-BY-NC-SA-4.0 (research / educational use). The adapter inherits PhyChip's project license posture; the base model carries its own license.

Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NithinReddyG/PhyChip-SmolLM3-3B-base-SFT

Adapter
(23)
this model