Omartificial-Intelligence-Space commited on
Commit
3eb600d
ยท
verified ยท
1 Parent(s): 3df3c4e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -14
README.md CHANGED
@@ -21,6 +21,10 @@ language:
21
 
22
  **Fanar-Math-R1-GRPO** is a reasoning-optimized language model built on [`QCRI/Fanar-1-9B-Instruct`](https://huggingface.co/QCRI/Fanar-1-9B-Instruct). This version is fine-tuned using **Group Relative Policy Optimization (GRPO)** from the DeepSeekMath framework on the [`AI-MO/NuminaMath-TIR`](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset. It is designed for step-by-step mathematical problem-solving with structured reasoning in both English and Arabic.
23
 
 
 
 
 
24
  ---
25
 
26
  ## ๐Ÿš€ Model Highlights
@@ -107,20 +111,6 @@ datasets==2.21.0
107
  math_verify==0.3.3
108
  torch==2.4.1
109
  ```
110
-
111
- ---
112
-
113
- ## ๐Ÿ“Š Training Metrics (Snapshot)
114
-
115
- | Step | Reward (avg) | Accuracy Reward | Format Reward | Loss | KL Divergence |
116
- |------|--------------|-----------------|---------------|-------|---------------|
117
- | 10 | 0.029 | 0.029 | 0.0 | 0.0 | 0.00024 |
118
- | 100 | 0.039 | 0.039 | 0.0 | 0.0001| 0.00188 |
119
- | 200 | 0.033 | 0.033 | 0.0 | 0.0001| 0.00183 |
120
- | 300 | 0.045 | 0.045 | 0.0 | 0.0001| 0.00127 |
121
-
122
- *Note: Training was run with a small config for notebook-friendly experimentation.*
123
-
124
  ---
125
 
126
  ## ๐Ÿ“š Output Format
 
21
 
22
  **Fanar-Math-R1-GRPO** is a reasoning-optimized language model built on [`QCRI/Fanar-1-9B-Instruct`](https://huggingface.co/QCRI/Fanar-1-9B-Instruct). This version is fine-tuned using **Group Relative Policy Optimization (GRPO)** from the DeepSeekMath framework on the [`AI-MO/NuminaMath-TIR`](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset. It is designed for step-by-step mathematical problem-solving with structured reasoning in both English and Arabic.
23
 
24
+
25
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/NEcy5S2aYn2ly2filngUp.png)
26
+
27
+
28
  ---
29
 
30
  ## ๐Ÿš€ Model Highlights
 
111
  math_verify==0.3.3
112
  torch==2.4.1
113
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
114
  ---
115
 
116
  ## ๐Ÿ“š Output Format