Instructions to use NithinReddyG/PhyChip-SmolLM3-3B-instruct-GRPO-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use NithinReddyG/PhyChip-SmolLM3-3B-instruct-GRPO-v1 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B") model = PeftModel.from_pretrained(base_model, "NithinReddyG/PhyChip-SmolLM3-3B-instruct-GRPO-v1") - Notebooks
- Google Colab
- Kaggle
PhyChip-SmolLM3-3B-instruct-GRPO-v1
LoRA adapter for PhyChip - a study of using ngspice as a verifiable RL reward to train a 1-3B language model to design analog circuits from natural-language specs.
- Base model:
HuggingFaceTB/SmolLM3-3B - Stage: instruct -> SFT -> GRPO (RLVR, ngspice reward, KL 0.01, 150-step budget). Checkpoint-100.
- Role in the study: Negative result: RL on top of the collapsed instruct-SFT cannot recover circuit ability (reward stuck ~0.05).
Evaluation (pass@1, greedy, ngspice + 23 spec harnesses)
| benchmark | pass@1 |
|---|---|
| phy-chip-bench-v1 (40) | 0/40 (0%) |
| phy-chip-bench-v2 (50, novel circuits) | 0/50 (0%) |
| AnalogCoder (24 textbook) | 0/24 (0%) |
- Adapter: LoRA (PEFT), rank
r=16,alpha=32. - Eval protocol: pass@1, greedy decoding (
temperature=0); each generation simulated in ngspice and scored by 23 deterministic spec-check harnesses (the same code for every model). - Decontamination: AnalogCoder is eval-only (never in training data).
phy-chip-bench-v2was built with an 8-gram overlap gate (block < 0.40; achieved max overlap 0.018).
How to load
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base = AutoModelForCausalLM.from_pretrained("HuggingFaceTB/SmolLM3-3B", torch_dtype="bfloat16", device_map="auto")
model = PeftModel.from_pretrained(base, "NithinReddyG/PhyChip-SmolLM3-3B-instruct-GRPO-v1")
tok = AutoTokenizer.from_pretrained("HuggingFaceTB/SmolLM3-3B")
The base-vs-instruct finding (why this model exists)
This adapter is one of six in a controlled base-vs-instruct study for PhyChip (analog-circuit design with ngspice as a verifiable RL reward). The same SFT/RL recipe was applied starting from SmolLM3-3B-Base and from SmolLM3-3B (instruct):
| start | + SFT (bench_v1) | + SFT + GRPO (AnalogCoder) |
|---|---|---|
| base | 0 -> 16/40 (SFT lifts it) | 22/24 (RL generalizes) |
| instruct | 0 -> 0/40 (SFT collapses it) | 0/24 (RL cannot recover) |
SFT on the instruct model degenerated into syntactically valid but device-less SPICE (resistor-salad, no active devices) - catastrophic forgetting on an extreme out-of-distribution domain. RL could not climb out (verifiable reward stuck ~0.05). Conclusion: fine-tune the base model, not the instruct model. This confirms the canonical base -> SFT -> RL ordering with a direct before/after measurement.
License
CC-BY-NC-SA-4.0 (research / educational use). The adapter inherits PhyChip's project license
posture; the base model carries its own license.
- Downloads last month
- 18