Nellyw888
/

VeriReason-Qwen2.5-3b-RTLCoder-Verilog-GRPO-reasoning-tb

Reinforcement Learning

text-generation

text-generation-inference

Model card Files Files and versions

Nellyw888 commited on May 17

Commit

25eecfd

·

verified ·

1 Parent(s): b836b08

Update README.md

Files changed (1) hide show

README.md +3 -1

README.md CHANGED Viewed

@@ -18,6 +18,8 @@ base_model:
 2025.05.17: Initial release of VeriReason-Qwen2.5-3B-Verilog-RTL-GRPO-reasoning-tb
 ## Project Description
 This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation.
 The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.
@@ -78,4 +80,4 @@ The GRPO (Generative Reinforcement Learning from Preference Optimization) traini
 ## Citation
 ## Acknowledgement
-This repo benefits from OpenR1 and LLamaFactory. Thanks for their wonderful works.

 2025.05.17: Initial release of VeriReason-Qwen2.5-3B-Verilog-RTL-GRPO-reasoning-tb
 ## Project Description
+This is the Model for the paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
 This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation.
 The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.
 ## Citation
 ## Acknowledgement
+This repo benefits from OpenR1 and LLamaFactory.