Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -12,8 +12,8 @@ datasets: | |
| 12 | 
             
            > Qwen-VL-PRM-7B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It demonstrates strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks when used with Qwen2.5-VL and Gemma-3 models despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.
         | 
| 13 |  | 
| 14 | 
             
            - **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh
         | 
| 15 | 
            -
            - **Repository:**  | 
| 16 | 
            -
            - **Paper:** https://arxiv.org/ | 
| 17 |  | 
| 18 | 
             
            # Use
         | 
| 19 |  | 
| @@ -57,12 +57,12 @@ The model usage is documented [here](https://github.com/theogbrand/vlprm/blob/ma | |
| 57 |  | 
| 58 | 
             
            ```bibtex
         | 
| 59 | 
             
            @misc{ong2025vlprms,
         | 
| 60 | 
            -
                  title={ | 
| 61 | 
            -
                  author={Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi and Soujanya Poria},
         | 
| 62 | 
             
                  year={2025},
         | 
| 63 | 
            -
                  eprint={},
         | 
| 64 | 
             
                  archivePrefix={arXiv},
         | 
| 65 | 
            -
                  primaryClass={cs. | 
| 66 | 
            -
                  url={}, 
         | 
| 67 | 
             
            }
         | 
| 68 | 
             
            ```
         | 
|  | |
| 12 | 
             
            > Qwen-VL-PRM-7B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It demonstrates strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks when used with Qwen2.5-VL and Gemma-3 models despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.
         | 
| 13 |  | 
| 14 | 
             
            - **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh
         | 
| 15 | 
            +
            - **Repository:** https://github.com/theogbrand/vlprm/
         | 
| 16 | 
            +
            - **Paper:** https://arxiv.org/pdf/2509.23250
         | 
| 17 |  | 
| 18 | 
             
            # Use
         | 
| 19 |  | 
|  | |
| 57 |  | 
| 58 | 
             
            ```bibtex
         | 
| 59 | 
             
            @misc{ong2025vlprms,
         | 
| 60 | 
            +
                  title={Training Vision-Language Process Reward Models for Test-Time Scaling in Multimodal Reasoning: Key Insights and Lessons Learned}, 
         | 
| 61 | 
            +
                  author={Brandon Ong, Tej Deep Pala, Vernon Toh, William Chandra Tjhi, and Soujanya Poria},
         | 
| 62 | 
             
                  year={2025},
         | 
| 63 | 
            +
                  eprint={2509.23250},
         | 
| 64 | 
             
                  archivePrefix={arXiv},
         | 
| 65 | 
            +
                  primaryClass={cs.AI},
         | 
| 66 | 
            +
                  url={https://arxiv.org/pdf/2509.23250}, 
         | 
| 67 | 
             
            }
         | 
| 68 | 
             
            ```
         |