qwen2.5-7b_solver_iter3

SCOPE solver checkpoint trained with DR-Zero long-form GRPO.

  • Model alias: qwen2.5-7b
  • Base model: Qwen2.5-7B-Instruct
  • Version: v1.9.21
  • Solver iteration: 3
  • Global step: 60
  • Source experiment: long_solver_v1.9.21_iter3_grpo_group8_qwen2.5-7b-instruct
  • Source local HF checkpoint: checkpoints/dr-zero/long_solver_v1.9.21_iter3_grpo_group8_qwen2.5-7b-instruct/merged_hf_actor_gs60
  • Collection: https://huggingface.co/collections/wckwan/scope

This repository contains the Hugging Face-format checkpoint files for loading with Transformers.

Downloads last month
25
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including wckwan/qwen2.5-7b_solver_iter3