Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -18,6 +18,8 @@ base_model: | |
| 18 | 
             
            2025.05.17: Initial release of VeriReason-Qwen2.5-3B-Verilog-RTL-GRPO-reasoning-tb
         | 
| 19 |  | 
| 20 | 
             
            ## Project Description
         | 
|  | |
|  | |
| 21 | 
             
            This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation.
         | 
| 22 |  | 
| 23 | 
             
            The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.
         | 
| @@ -78,4 +80,4 @@ The GRPO (Generative Reinforcement Learning from Preference Optimization) traini | |
| 78 | 
             
            ## Citation
         | 
| 79 |  | 
| 80 | 
             
            ## Acknowledgement
         | 
| 81 | 
            -
            This repo benefits from OpenR1 and LLamaFactory. | 
|  | |
| 18 | 
             
            2025.05.17: Initial release of VeriReason-Qwen2.5-3B-Verilog-RTL-GRPO-reasoning-tb
         | 
| 19 |  | 
| 20 | 
             
            ## Project Description
         | 
| 21 | 
            +
             | 
| 22 | 
            +
            This is the Model for the paper: VeriReason: Reinforcement Learning with Testbench Feedback for Reasoning-Enhanced Verilog Generation
         | 
| 23 | 
             
            This study introduces VeriReason, a novel approach utilizing reinforcement learning with testbench feedback to enhance the performance of pre-trained models for Verilog RTL code generation. VeriReason-Qwen2.5-3B is a 3B parameter model based on Qwen2.5-Coder-3B that combines supervised fine-tuning with Guided Reward Proximal Optimization (GRPO) reinforcement learning, specifically tailored for RTL code generation.
         | 
| 24 |  | 
| 25 | 
             
            The model integrates explicit reasoning capabilities with reinforcement learning for Verilog generation, establishing a new state-of-the-art for automated RTL synthesis in a smaller model size. By using our curated high-quality training examples alongside a feedback-driven reward model, this 3B parameter model delivers exceptional performance on Verilog generation tasks while maintaining efficiency.
         | 
|  | |
| 80 | 
             
            ## Citation
         | 
| 81 |  | 
| 82 | 
             
            ## Acknowledgement
         | 
| 83 | 
            +
            This repo benefits from OpenR1 and LLamaFactory.
         |