Qwen3-1.7B-GOPD-DeepMath

Model Details

Qwen3-1.7B fine-tuned with ExOPD (Extended Group Relative Policy Optimization) on DeepMath-103K mathematical reasoning dataset.

Training

Component Details
Base Model Qwen/Qwen3-1.7B
Algorithm ExOPD (GRPO + Rollout Correction)
Dataset zwhe99/DeepMath-103K
Filter difficulty ≥ 6 (Olympiad level)
Samples 8,000 problems
Teacher Model Keven16/Qwen3-4B-Non-Thinking-RL-Math-Step500
Epochs 3
Batch Size 256
Learning Rate 1e-5

Performance

TBD

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("jindun/Qwen3-1.7B-GOPD-DeepMath", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("jindun/Qwen3-1.7B-GOPD-DeepMath")

Key Finding

Supervised Fine-Tuning (SFT) degraded performance (-16.67%). This demonstrates the limitations of imitation learning. GOPD learns genuine reasoning skills through trial-and-error exploration.

License

Apache 2.0

Downloads last month
5
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jindun/Qwen3-1.7B-GOPD-DeepMath

Finetuned
Qwen/Qwen3-1.7B
Finetuned
(780)
this model

Dataset used to train jindun/Qwen3-1.7B-GOPD-DeepMath