Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: mit
|
| 3 |
+
datasets:
|
| 4 |
+
- Leo-Dai/dapo-math-17k_dedup
|
| 5 |
+
---
|
| 6 |
+
# 🧠 Parallel-R1-Unseen_Step_200
|
| 7 |
+
|
| 8 |
+
> **Mid-Training Checkpoint of Parallel-R1: Towards Parallel Thinking via Reinforcement Learning**
|
| 9 |
+
> Stage: **After 200 RL steps via alternating rewards** — showing the adaptive parallel reasoning ability and serve as structure exploration stage.
|
| 10 |
+
|
| 11 |
+
This checkpoint aims to help you reproduce experimental results in Section 4.5: Extra Bonus: Parallel Thinking as a Mid-Training Exploration Strategy for RL Training.
|
| 12 |
+
|