VLSI-Gemma-v3

A Gemma 4 26B (MoE) model fine-tuned via Reinforcement Learning (GRPO) specifically for VLSI design and verification tasks. Trained on 547 VLSI questions across 7 domains.

HF repo: https://huggingface.co/vxkyyy/vlsi-gemma-v3

Model Details

Property	Value
Base model	`google/gemma-4-26b-a4b-it`
Training method	GRPO (Group Relative Policy Optimization)
Platform	Castform
Parameters	26B total (4B active, MoE)
Adapter type	LoRA (rank 128, alpha 256)
Best checkpoint	Step 159 (eval correct 0.896)
Total training steps	279

Training Data

547 questions across 7 VLSI domains:

Domain	Count
Analog/Mixed-Signal	140
Digital RTL Design	119
Verification (UVM/SystemVerilog)	68
PDK / Physical Design	66
Synthesis / STA	56
General VLSI	56
Mixed Hard Problems	42

Performance

Metrics on held-out eval set (110 questions):

Metric	Value
Correct reward (mean)	0.896
Correct reward (max@8)	0.942
Total reward	1.291
Quality reward	0.298
Structure reward	0.049
Syntax (compile) reward	0.018

Comparison vs Qwen 3.5-4B baseline:

+11.6% improvement over v2 Qwen eval correct (0.789 → 0.880)
Best-of-8 eval: 0.942 correct pass rate

Usage

Quick start (with LoRA adapter)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "google/gemma-4-26b-a4b-it"
adapter_path = "./vlsi-gemma-v3-checkpoint-159"

# Load base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    device_map="auto",
    torch_dtype="auto",
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_path)

# Ask a VLSI question
prompt = "Write synthesizable Verilog for a 4-bit synchronous counter with parallel load and active-low reset."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))

Merge and export

merged = model.merge_and_unload()
merged.save_pretrained("./vlsi-gemma-v3-merged")
tokenizer.save_pretrained("./vlsi-gemma-v3-merged")

Training Details

Environment: Custom VLSI Q&A environment with 5 reward components:
- Correctness (keyword matching against ground truth) — weight: 1.0
- Verilog syntax (compile-check via pyverilog parser) — weight: 0.02
- Code quality (presence of code blocks) — weight: 0.05
- Answer quality (technical depth, vocabulary density) — weight: 0.3
- Structure (proper formatting) — weight: 0.05
Learning rate: 1e-5
Group size: 9 rollouts per prompt
Epochs: 10
Hardware: NVIDIA A100/H100 GPUs

License

Apache 2.0

Downloads last month: -