QuantumStackOverflow
/

ASTER_4B_RL

Reinforcement Learning

Eval Results (legacy)

Model card Files Files and versions

QuantumStackOverflow commited on 4 days ago

Commit

ace14f6

·

verified ·

1 Parent(s): 1f511e1

Update README.md

Files changed (1) hide show

README.md +70 -3

README.md CHANGED Viewed

@@ -1,3 +1,70 @@
----
-license: apache-2.0
----

+---
+license: apache-2.0
+base_model: Qwen/Qwen3-4B-Thinking-2507
+tags:
+  - aster
+  - reinforcement-learning
+  - sft
+  - reproduction
+metrics:
+  - accuracy
+model-index:
+  - name: ASTER_4B
+    results:
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: AIME 2025
+          type: aime2025
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 87.7
+      - task:
+          type: text-generation
+          name: Text Generation
+        dataset:
+          name: HMMT 2025 Feb
+          type: hmmt_2025_feb
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 77.1
+---
+# ASTER_4B (Independent Reproduction)
+[![Paper](https://img.shields.io/badge/Paper-ArXiv.2602.01204-B31B1B.svg)](https://arxiv.org/pdf/2602.01204)
+[![GitHub](https://img.shields.io/badge/GitHub-Reproduction_Code-black)](https://github.com/Rainyrou/ASTER)
+[![License](https://img.shields.io/badge/License-Apache_2.0-green.svg)](https://huggingface.co/datasets/choosealicense/licenses/apache-2.0)
+## Model Description
+**ASTER_4B** is an independent reproduction of the ASTER framework. This model is fine-tuned based on [Qwen/Qwen3-4B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-4B-Thinking-2507), strictly adhering to the experimental details and hyperparameter settings described in the original ASTER paper.
+> ⚠️ **Note:** This is a **reproduction project**. We aim to verify the effectiveness of the ASTER method by strictly following the official paper's details.
+## Training Data (SFT)
+The model was trained using our reproduced dataset: **Aster_SFT4K**.
+This dataset serves as a tiny yet effective SFT set, constructed to replicate the exact data distribution and formatting used in the original ASTER experiments. You can find the dataset details here:
+* **Dataset Repo:** [ASTER_SFT4K](https://huggingface.co/datasets/QuantumStackOverflow/ASTER_SFT4K)
+## Evaluation Results
+We evaluated the model's performance on challenging mathematical benchmarks. The evaluation was conducted under the **exact generation configuration** specified in the ASTER paper to ensure fair comparison.
+**Generation Config:**
+* **Temperature:** `1.0`
+* **Top_p:** `1.0`
+* **Max_context_length**: `96256`
+| Benchmark | Score (%) |
+| :--- | :--- |
+| **AIME 2025** | **87.7** |
+| **HMMT 2025 (Feb)** | **77.1** |