Update README.md
Browse files
README.md
CHANGED
|
@@ -21,6 +21,10 @@ language:
|
|
| 21 |
|
| 22 |
**Fanar-Math-R1-GRPO** is a reasoning-optimized language model built on [`QCRI/Fanar-1-9B-Instruct`](https://huggingface.co/QCRI/Fanar-1-9B-Instruct). This version is fine-tuned using **Group Relative Policy Optimization (GRPO)** from the DeepSeekMath framework on the [`AI-MO/NuminaMath-TIR`](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset. It is designed for step-by-step mathematical problem-solving with structured reasoning in both English and Arabic.
|
| 23 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 24 |
---
|
| 25 |
|
| 26 |
## ๐ Model Highlights
|
|
@@ -107,20 +111,6 @@ datasets==2.21.0
|
|
| 107 |
math_verify==0.3.3
|
| 108 |
torch==2.4.1
|
| 109 |
```
|
| 110 |
-
|
| 111 |
-
---
|
| 112 |
-
|
| 113 |
-
## ๐ Training Metrics (Snapshot)
|
| 114 |
-
|
| 115 |
-
| Step | Reward (avg) | Accuracy Reward | Format Reward | Loss | KL Divergence |
|
| 116 |
-
|------|--------------|-----------------|---------------|-------|---------------|
|
| 117 |
-
| 10 | 0.029 | 0.029 | 0.0 | 0.0 | 0.00024 |
|
| 118 |
-
| 100 | 0.039 | 0.039 | 0.0 | 0.0001| 0.00188 |
|
| 119 |
-
| 200 | 0.033 | 0.033 | 0.0 | 0.0001| 0.00183 |
|
| 120 |
-
| 300 | 0.045 | 0.045 | 0.0 | 0.0001| 0.00127 |
|
| 121 |
-
|
| 122 |
-
*Note: Training was run with a small config for notebook-friendly experimentation.*
|
| 123 |
-
|
| 124 |
---
|
| 125 |
|
| 126 |
## ๐ Output Format
|
|
|
|
| 21 |
|
| 22 |
**Fanar-Math-R1-GRPO** is a reasoning-optimized language model built on [`QCRI/Fanar-1-9B-Instruct`](https://huggingface.co/QCRI/Fanar-1-9B-Instruct). This version is fine-tuned using **Group Relative Policy Optimization (GRPO)** from the DeepSeekMath framework on the [`AI-MO/NuminaMath-TIR`](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset. It is designed for step-by-step mathematical problem-solving with structured reasoning in both English and Arabic.
|
| 23 |
|
| 24 |
+
|
| 25 |
+

|
| 26 |
+
|
| 27 |
+
|
| 28 |
---
|
| 29 |
|
| 30 |
## ๐ Model Highlights
|
|
|
|
| 111 |
math_verify==0.3.3
|
| 112 |
torch==2.4.1
|
| 113 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 114 |
---
|
| 115 |
|
| 116 |
## ๐ Output Format
|