spec2rtl-qwen32b-lora-rl-v2

A QLoRA adapter (r=16) for Qwen2.5-Coder-32B-Instruct fine-tuned for Verilog RTL generation from natural-language hardware specifications.

This adapter is the RL stage (GRPO v2) of a two-stage training pipeline: SFT on 13,568 spec-to-RTL examples → GRPO reinforcement learning with iverilog compile reward. It is the Generator component in the Spec2RTL agentic system, which pairs it with Claude Sonnet as a Reflector to iteratively self-correct generated Verilog.

GitHub repo: https://github.com/Noahsabb/spec2RTL

Benchmark Results

Evaluated on CVDP cid003 — 78 RTL natural-language-spec-to-code problems, scored with the full cocotb simulation harness (functional correctness, not just syntax).

System	Overall	Easy (41)	Medium (37)
Base Qwen2.5-Coder-32B-Instruct	14.10% (11/78)	21.95%	5.41%
+ SFT fine-tuning	19.23% (15/78)	24.39%	13.51%
+ RL GRPO v2 (this adapter)	29.49% (23/78)	36.59%	21.62%
+ Agentic loop v10 (Qwen+Sonnet reflector)	53.85% (42/78)	70.73%	35.14%
Final system (agentic v10+v11 cherry-pick)	58.97% (46/78)	75.61%	40.54%
Claude Sonnet 4.6 standalone (baseline)	55.13% (43/78)	—	—

The final agentic system beats Claude Sonnet 4.6 standalone by +3.84pp using this adapter as the Generator.

Model Details

Base model: Qwen/Qwen2.5-Coder-32B-Instruct
Adapter type: LoRA (via PEFT)
LoRA rank: r=16, alpha=32, dropout=0.05
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Trainable parameters: 134,217,728 / 32,898,094,080 (0.408%)
Adapter size: ~513 MB

Training Pipeline

Stage 1 — SFT (separate adapter, not in this repo):

Dataset: 13,568 examples built from shailja/Verilog_GitHub (~7,500 validated Verilog modules)
Task types: spec-to-RTL (8,128), editing (4,015), debugging (1,425)
Config: QLoRA r=32, α=64, 5 epochs, lr=1e-4, seq_len=4096
Infrastructure: 1× H100 80GB, ~21h wall time

Stage 2 — GRPO RL (this adapter):

Starting point: SFT adapter merged into base weights; fresh r=16 LoRA head
Reward: tiered iverilog compile signal — hard fail 0.0, soft fail (malformed) 0.2, clean compile 1.0
Config: G=2 completions, max_new_tokens=256, lr=5e-6, 3 epochs
Infrastructure: 1× H100 80GB, ~5.5h wall time
Training compile rate: 7–10% → confirms reward signal is meaningful (not trivially solved)

Agentic Loop (for full system results)

This adapter serves as the Generator in a Reflector–Generator loop:

Generator (this adapter) produces initial Verilog from spec
Compiler (iverilog) checks syntax → Reflector (Claude Sonnet 4.6) diagnoses errors → Generator repairs
Simulator (cocotb harness) checks functional correctness → Reflector diagnoses → Generator repairs
Loop runs up to 3 compile iterations + 4 cocotb iterations

The +24.36pp improvement from RL v2 (29.49%) to agentic v10 (53.85%) comes from the Reflector providing structured, testbench-aware diagnosis at each iteration.

How to Use

Load adapter for inference

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

base_model_id = "Qwen/Qwen2.5-Coder-32B-Instruct"
adapter_id = "Noahsabb/spec2rtl-qwen32b-lora-rl-v2"

# Load base model in bf16 (requires ~65GB VRAM — fits a single H100 or A100 80GB)
tokenizer = AutoTokenizer.from_pretrained(adapter_id)  # tokenizer is included in adapter repo
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=True,
)
model = PeftModel.from_pretrained(model, adapter_id)
model = model.merge_and_unload()   # merge LoRA into base for faster inference
model = model.to("cuda:0")
model.eval()

Generate Verilog from a specification

spec = """
## Specification

Design a synchronous 4-bit up-counter with active-high reset.
- Inputs: clk (clock), rst (synchronous reset, active high), en (count enable)
- Outputs: count [3:0] (counter value)
- Behavior: On rising clock edge, if rst is high, count resets to 0.
  If en is high and rst is low, count increments by 1, wrapping from 15 to 0.
"""

prompt = f"Generate synthesizable Verilog RTL for the following specification.\n\n{spec}"
messages = [{"role": "user", "content": prompt}]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
)
inputs = tokenizer(text, return_tensors="pt").to("cuda:0")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_new_tokens=2048,
        temperature=0.2,
        do_sample=True,
        pad_token_id=tokenizer.eos_token_id,
    )

generated = tokenizer.decode(
    outputs[0][inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
)
print(generated)

Memory-efficient inference (if VRAM is limited)

For GPUs with less than 65GB VRAM, skip merge_and_unload() and use the adapter directly without merging. The model will use slightly more memory during inference but avoids the merge overhead.

Limitations

Single-shot pass rate is 29.49% — the adapter is designed for use in an agentic loop, not standalone generation. Raw single-shot results are well below the agentic system's 58.97%.
Training reward is compile-only — the RL reward signal checks iverilog syntax, not functional correctness. The model learns to produce compilable Verilog but not necessarily correct Verilog.
Complex multi-bug problems still fail — problems requiring precise timing, multi-cycle FSM coordination, or ambiguous specs require the Reflector to provide targeted feedback.
Max 256 tokens during RL training — the RL generator was trained with short max_new_tokens for compute reasons. Inference with longer outputs (up to 2048 tokens) is fine but was not the training distribution.

Citation

This adapter was developed as part of a course project (CS153, Stanford University) implementing NVIDIA's ACE-RTL system at academic scale.

@misc{spec2rtl2026,
  author = {Sabbavarapu, Noah},
  title  = {Spec2RTL: Fine-tuned Qwen2.5-Coder-32B + Agentic Self-Correction for Verilog RTL Generation},
  year   = {2026},
  url    = {https://github.com/Noahsabb/spec2RTL}
}

Related work: