Omartificial-Intelligence-Space
/

Fanar-Math-R1-GRPO

@@ -21,6 +21,10 @@ language:
 **Fanar-Math-R1-GRPO** is a reasoning-optimized language model built on [`QCRI/Fanar-1-9B-Instruct`](https://huggingface.co/QCRI/Fanar-1-9B-Instruct). This version is fine-tuned using **Group Relative Policy Optimization (GRPO)** from the DeepSeekMath framework on the [`AI-MO/NuminaMath-TIR`](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset. It is designed for step-by-step mathematical problem-solving with structured reasoning in both English and Arabic.
 ---
 ## 🚀 Model Highlights
@@ -107,20 +111,6 @@ datasets==2.21.0
 math_verify==0.3.3
 torch==2.4.1
 ```
----
-## 📊 Training Metrics (Snapshot)
-| Step | Reward (avg) | Accuracy Reward | Format Reward | Loss  | KL Divergence |
-|------|--------------|-----------------|---------------|-------|---------------|
-| 10   | 0.029        | 0.029           | 0.0           | 0.0   | 0.00024       |
-| 100  | 0.039        | 0.039           | 0.0           | 0.0001| 0.00188       |
-| 200  | 0.033        | 0.033           | 0.0           | 0.0001| 0.00183       |
-| 300  | 0.045        | 0.045           | 0.0           | 0.0001| 0.00127       |
-*Note: Training was run with a small config for notebook-friendly experimentation.*
 ---
 ## 📚 Output Format

 **Fanar-Math-R1-GRPO** is a reasoning-optimized language model built on [`QCRI/Fanar-1-9B-Instruct`](https://huggingface.co/QCRI/Fanar-1-9B-Instruct). This version is fine-tuned using **Group Relative Policy Optimization (GRPO)** from the DeepSeekMath framework on the [`AI-MO/NuminaMath-TIR`](https://huggingface.co/datasets/AI-MO/NuminaMath-TIR) dataset. It is designed for step-by-step mathematical problem-solving with structured reasoning in both English and Arabic.
+![image/png](https://cdn-uploads.huggingface.co/production/uploads/628f7a71dd993507cfcbe587/NEcy5S2aYn2ly2filngUp.png)
 ---
 ## 🚀 Model Highlights
 math_verify==0.3.3
 torch==2.4.1
 ```
 ---
 ## 📚 Output Format