Update README.md
Browse files
README.md
CHANGED
|
@@ -12,7 +12,7 @@ datasets:
|
|
| 12 |
> Qwen-VL-PRM-7B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It demonstrates strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks when used with Qwen2.5-VL and Gemma-3 models despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.
|
| 13 |
|
| 14 |
- **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh
|
| 15 |
-
- **Repository:** https://github.com/theogbrand/vlprm
|
| 16 |
- **Paper:** https://arxiv.org/pdf/2509.23250
|
| 17 |
|
| 18 |
# Use
|
|
|
|
| 12 |
> Qwen-VL-PRM-7B is a process reward model finetuned from Qwen2.5-7B-Instruct on approximately 300,000 examples. It demonstrates strong test-time scaling performance improvements on various advanced multimodal reasoning benchmarks when used with Qwen2.5-VL and Gemma-3 models despite being trained mainly on abstract reasoning datasets and elementary reasoning datasets.
|
| 13 |
|
| 14 |
- **Logs:** https://wandb.ai/aisg-arf/multimodal-reasoning/runs/pj4oc0qh
|
| 15 |
+
- **Repository:** https://github.com/theogbrand/vlprm
|
| 16 |
- **Paper:** https://arxiv.org/pdf/2509.23250
|
| 17 |
|
| 18 |
# Use
|