Nellyw888 commited on
Commit
25eecfd
·
verified ·
1 Parent(s): b836b08

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -1
README.md CHANGED
@@ -18,6 +18,8 @@ base_model:
18
  2025.05.17: Initial release of VeriReason-Qwen2.5-3B-Verilog-RTL-GRPO-reasoning-tb
19
 
20
  ## Project Description
 
 
21
  This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation.
22
 
23
  The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.
@@ -78,4 +80,4 @@ The GRPO (Generative Reinforcement Learning from Preference Optimization) traini
78
  ## Citation
79
 
80
  ## Acknowledgement
81
- This repo benefits from OpenR1 and LLamaFactory. Thanks for their wonderful works.
 
18
  2025.05.17: Initial release of VeriReason-Qwen2.5-3B-Verilog-RTL-GRPO-reasoning-tb
19
 
20
  ## Project Description
21
+
22
+ This is the Model for the paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
23
  This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation.
24
 
25
  The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.
 
80
  ## Citation
81
 
82
  ## Acknowledgement
83
+ This repo benefits from OpenR1 and LLamaFactory.