Qwen3-1.7B-GOPD-DeepMath

Model Details

Qwen3-1.7B fine-tuned with ExOPD (Extended Group Relative Policy Optimization) on DeepMath-103K mathematical reasoning dataset.

Training

Component	Details
Base Model	Qwen/Qwen3-1.7B
Algorithm	ExOPD (GRPO + Rollout Correction)
Dataset	zwhe99/DeepMath-103K
Filter	difficulty ≥ 6 (Olympiad level)
Samples	8,000 problems
Teacher Model	Keven16/Qwen3-4B-Non-Thinking-RL-Math-Step500
Epochs	3
Batch Size	256
Learning Rate	1e-5

Performance

TBD

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("jindun/Qwen3-1.7B-GOPD-DeepMath", device_map="auto")
tokenizer = AutoTokenizer.from_pretrained("jindun/Qwen3-1.7B-GOPD-DeepMath")

Key Finding

Supervised Fine-Tuning (SFT) degraded performance (-16.67%). This demonstrates the limitations of imitation learning. GOPD learns genuine reasoning skills through trial-and-error exploration.

License

Apache 2.0

Downloads last month: 5

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jindun/Qwen3-1.7B-GOPD-DeepMath

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

(780)

this model

jindun
/

Qwen3-1.7B-GOPD-DeepMath