Instructions to use vxkyyy/vlsi-gemma-v3 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use vxkyyy/vlsi-gemma-v3 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
VLSI-Gemma-v3
A Gemma 4 26B (MoE) model fine-tuned via Reinforcement Learning (GRPO) specifically for VLSI design and verification tasks. Trained on 547 VLSI questions across 7 domains.
HF repo: https://huggingface.co/vxkyyy/vlsi-gemma-v3
Model Details
| Property | Value |
|---|---|
| Base model | google/gemma-4-26b-a4b-it |
| Training method | GRPO (Group Relative Policy Optimization) |
| Platform | Castform |
| Parameters | 26B total (4B active, MoE) |
| Adapter type | LoRA (rank 128, alpha 256) |
| Best checkpoint | Step 159 (eval correct 0.896) |
| Total training steps | 279 |
Training Data
547 questions across 7 VLSI domains:
| Domain | Count |
|---|---|
| Analog/Mixed-Signal | 140 |
| Digital RTL Design | 119 |
| Verification (UVM/SystemVerilog) | 68 |
| PDK / Physical Design | 66 |
| Synthesis / STA | 56 |
| General VLSI | 56 |
| Mixed Hard Problems | 42 |
Performance
Metrics on held-out eval set (110 questions):
| Metric | Value |
|---|---|
| Correct reward (mean) | 0.896 |
| Correct reward (max@8) | 0.942 |
| Total reward | 1.291 |
| Quality reward | 0.298 |
| Structure reward | 0.049 |
| Syntax (compile) reward | 0.018 |
Comparison vs Qwen 3.5-4B baseline:
- +11.6% improvement over v2 Qwen eval correct (0.789 โ 0.880)
- Best-of-8 eval: 0.942 correct pass rate
Usage
Quick start (with LoRA adapter)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model_id = "google/gemma-4-26b-a4b-it"
adapter_path = "./vlsi-gemma-v3-checkpoint-159"
# Load base model
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
base_model = AutoModelForCausalLM.from_pretrained(
base_model_id,
device_map="auto",
torch_dtype="auto",
)
# Apply LoRA adapter
model = PeftModel.from_pretrained(base_model, adapter_path)
# Ask a VLSI question
prompt = "Write synthesizable Verilog for a 4-bit synchronous counter with parallel load and active-low reset."
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(outputs[0]))
Merge and export
merged = model.merge_and_unload()
merged.save_pretrained("./vlsi-gemma-v3-merged")
tokenizer.save_pretrained("./vlsi-gemma-v3-merged")
Training Details
- Environment: Custom VLSI Q&A environment with 5 reward components:
- Correctness (keyword matching against ground truth) โ weight: 1.0
- Verilog syntax (compile-check via pyverilog parser) โ weight: 0.02
- Code quality (presence of code blocks) โ weight: 0.05
- Answer quality (technical depth, vocabulary density) โ weight: 0.3
- Structure (proper formatting) โ weight: 0.05
- Learning rate: 1e-5
- Group size: 9 rollouts per prompt
- Epochs: 10
- Hardware: NVIDIA A100/H100 GPUs
License
Apache 2.0
- Downloads last month
- -