Instructions to use Noahsabb/spec2rtl-qwen32b-lora-rl-v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use Noahsabb/spec2rtl-qwen32b-lora-rl-v2 with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2.5-Coder-32B-Instruct") model = PeftModel.from_pretrained(base_model, "Noahsabb/spec2rtl-qwen32b-lora-rl-v2") - Notebooks
- Google Colab
- Kaggle
spec2rtl-qwen32b-lora-rl-v2
A QLoRA adapter (r=16) for Qwen2.5-Coder-32B-Instruct fine-tuned for Verilog RTL generation from natural-language hardware specifications.
This adapter is the RL stage (GRPO v2) of a two-stage training pipeline: SFT on 13,568 spec-to-RTL examples → GRPO reinforcement learning with iverilog compile reward. It is the Generator component in the Spec2RTL agentic system, which pairs it with Claude Sonnet as a Reflector to iteratively self-correct generated Verilog.
GitHub repo: https://github.com/Noahsabb/spec2RTL
Benchmark Results
Evaluated on CVDP cid003 — 78 RTL natural-language-spec-to-code problems, scored with the full cocotb simulation harness (functional correctness, not just syntax).
| System | Overall | Easy (41) | Medium (37) |
|---|---|---|---|
| Base Qwen2.5-Coder-32B-Instruct | 14.10% (11/78) | 21.95% | 5.41% |
| + SFT fine-tuning | 19.23% (15/78) | 24.39% | 13.51% |
| + RL GRPO v2 (this adapter) | 29.49% (23/78) | 36.59% | 21.62% |
| + Agentic loop v10 (Qwen+Sonnet reflector) | 53.85% (42/78) | 70.73% | 35.14% |
| Final system (agentic v10+v11 cherry-pick) | 58.97% (46/78) | 75.61% | 40.54% |
| Claude Sonnet 4.6 standalone (baseline) | 55.13% (43/78) | — | — |
The final agentic system beats Claude Sonnet 4.6 standalone by +3.84pp using this adapter as the Generator.
Model Details
- Base model: Qwen/Qwen2.5-Coder-32B-Instruct
- Adapter type: LoRA (via PEFT)
- LoRA rank: r=16, alpha=32, dropout=0.05
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Trainable parameters: 134,217,728 / 32,898,094,080 (0.408%)
- Adapter size: ~513 MB
Training Pipeline
Stage 1 — SFT (separate adapter, not in this repo):
- Dataset: 13,568 examples built from
shailja/Verilog_GitHub(~7,500 validated Verilog modules) - Task types: spec-to-RTL (8,128), editing (4,015), debugging (1,425)
- Config: QLoRA r=32, α=64, 5 epochs, lr=1e-4, seq_len=4096
- Infrastructure: 1× H100 80GB, ~21h wall time
Stage 2 — GRPO RL (this adapter):
- Starting point: SFT adapter merged into base weights; fresh r=16 LoRA head
- Reward: tiered iverilog compile signal — hard fail 0.0, soft fail (malformed) 0.2, clean compile 1.0
- Config: G=2 completions, max_new_tokens=256, lr=5e-6, 3 epochs
- Infrastructure: 1× H100 80GB, ~5.5h wall time
- Training compile rate: 7–10% → confirms reward signal is meaningful (not trivially solved)
Agentic Loop (for full system results)
This adapter serves as the Generator in a Reflector–Generator loop:
- Generator (this adapter) produces initial Verilog from spec
- Compiler (iverilog) checks syntax → Reflector (Claude Sonnet 4.6) diagnoses errors → Generator repairs
- Simulator (cocotb harness) checks functional correctness → Reflector diagnoses → Generator repairs
- Loop runs up to 3 compile iterations + 4 cocotb iterations
The +24.36pp improvement from RL v2 (29.49%) to agentic v10 (53.85%) comes from the Reflector providing structured, testbench-aware diagnosis at each iteration.
How to Use
Load adapter for inference
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
base_model_id = "Qwen/Qwen2.5-Coder-32B-Instruct"
adapter_id = "Noahsabb/spec2rtl-qwen32b-lora-rl-v2"
# Load base model in bf16 (requires ~65GB VRAM — fits a single H100 or A100 80GB)
tokenizer = AutoTokenizer.from_pretrained(adapter_id) # tokenizer is included in adapter repo
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(model, adapter_id)
model = model.merge_and_unload() # merge LoRA into base for faster inference
model = model.to("cuda:0")
model.eval()
Generate Verilog from a specification
spec = """
## Specification
Design a synchronous 4-bit up-counter with active-high reset.
- Inputs: clk (clock), rst (synchronous reset, active high), en (count enable)
- Outputs: count [3:0] (counter value)
- Behavior: On rising clock edge, if rst is high, count resets to 0.
If en is high and rst is low, count increments by 1, wrapping from 15 to 0.
"""
prompt = f"Generate synthesizable Verilog RTL for the following specification.\n\n{spec}"
messages = [{"role": "user", "content": prompt}]
text = tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to("cuda:0")
with torch.no_grad():
outputs = model.generate(
**inputs,
max_new_tokens=2048,
temperature=0.2,
do_sample=True,
pad_token_id=tokenizer.eos_token_id,
)
generated = tokenizer.decode(
outputs[0][inputs["input_ids"].shape[1]:],
skip_special_tokens=True,
)
print(generated)
Memory-efficient inference (if VRAM is limited)
For GPUs with less than 65GB VRAM, skip merge_and_unload() and use the adapter directly without merging. The model will use slightly more memory during inference but avoids the merge overhead.
Limitations
- Single-shot pass rate is 29.49% — the adapter is designed for use in an agentic loop, not standalone generation. Raw single-shot results are well below the agentic system's 58.97%.
- Training reward is compile-only — the RL reward signal checks iverilog syntax, not functional correctness. The model learns to produce compilable Verilog but not necessarily correct Verilog.
- Complex multi-bug problems still fail — problems requiring precise timing, multi-cycle FSM coordination, or ambiguous specs require the Reflector to provide targeted feedback.
- Max 256 tokens during RL training — the RL generator was trained with short max_new_tokens for compute reasons. Inference with longer outputs (up to 2048 tokens) is fine but was not the training distribution.
Citation
This adapter was developed as part of a course project (CS153, Stanford University) implementing NVIDIA's ACE-RTL system at academic scale.
@misc{spec2rtl2026,
author = {Sabbavarapu, Noah},
title = {Spec2RTL: Fine-tuned Qwen2.5-Coder-32B + Agentic Self-Correction for Verilog RTL Generation},
year = {2026},
url = {https://github.com/Noahsabb/spec2RTL}
}
Related work:
- ACE-RTL: arXiv:2602.10218
- CVDP Benchmark: arXiv:2506.14074
- Qwen2.5-Coder: arXiv:2409.12186
- Downloads last month
- 10
Model tree for Noahsabb/spec2rtl-qwen32b-lora-rl-v2
Base model
Qwen/Qwen2.5-32B