DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 145
How to use michalsr/toolmerge-planner-grpo with Transformers:
# Load model directly
from transformers import AutoProcessor, AutoModelForImageTextToText
processor = AutoProcessor.from_pretrained("michalsr/toolmerge-planner-grpo")
model = AutoModelForImageTextToText.from_pretrained("michalsr/toolmerge-planner-grpo")GRPO-finetuned planner from Qwen3-VL-8B-Instruct, used as the text-only query decomposer in the ToolMerge keyframe-retrieval pipeline.
Trained with TRL's GRPO trainer on Molmo-2 Moments (M2M) training data,
optimizing the frames-in-GT + consistency reward at global_step=50.
from transformers import AutoProcessor, AutoModelForCausalLM
processor = AutoProcessor.from_pretrained("michalsr/toolmerge-planner-grpo")
model = AutoModelForCausalLM.from_pretrained(
"michalsr/toolmerge-planner-grpo",
torch_dtype="bfloat16",
)
To use inside ToolMerge, override the planner checkpoint at the CLI:
toolmerge config=configs/m2m/qwen3_8.yaml \
model.base=michalsr/toolmerge-planner-grpo
| Setting | Value |
|---|---|
| Base model | Qwen/Qwen3-VL-8B-Instruct |
| Reward | frames_in_gt=1.0, consistency=1.0 |
| Training data | train_correct_uniform_8f_clip_max1.json (filtered M2M train split, ~1500 items) |
| Optimizer | paged_adamw_8bit, lr=1e-6, bf16 |
| Compute | 2 nodes × 4 GPUs |
| Step | global_step=50 |
| Framework | TRL 0.27.2, transformers 4.57.6, PyTorch 2.10.0 |
Full training config: training/configs/m2m_grpo.yaml
in the ToolMerge repo.
@inproceedings{toolmerge2026,
title = {Decomposing Queries into Tool Calls for Long-Video Keyframe Retrieval},
author = {TODO},
booktitle = {TODO},
year = {2026},
}
Cite the GRPO method:
@article{shao2024deepseekmath,
title = {{DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models}},
author = {Zhihong Shao and Peiyi Wang and Qihao Zhu and Runxin Xu and Junxiao Song and Mingchuan Zhang and Y. K. Li and Y. Wu and Daya Guo},
year = 2024,
eprint = {arXiv:2402.03300},
}
Code repo: https://github.com/michalsr/ToolMerge.
Base model
Qwen/Qwen3-VL-8B-Instruct