YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen3-1.7B GRPO for Python Code Generation on MBPP

This repo contains a GRPO training script for improving Qwen/Qwen3-1.7B-Base on Python code generation using executable rewards on the google-research-datasets/mbpp dataset.

Training objective

The reward function:

  • executes generated Python code in a subprocess,
  • scores whether it runs without errors,
  • checks whether MBPP assertions pass,
  • checks whether the target function has a proper docstring.

Reward weights in the script:

  • run without timeout/runtime failure: 0.25
  • pass assertions: 0.60
  • docstring present: 0.15

Dataset

  • Train/eval dataset: google-research-datasets/mbpp (sanitized config)
  • Verified columns: prompt, code, test_imports, test_list
  • The script converts the dataset to TRL GRPO prompt-only conversational format.

Model

  • Base model: Qwen/Qwen3-1.7B-Base
  • Architecture verified from model config: Qwen3ForCausalLM

Reference recipe

Published executable-feedback code RL recipes that informed this setup:

  • StepCoder (2402.01391): compiler/unit-test reward shaping on APPS+
  • ACECoder (2502.01718): large-scale synthesized test-case RLVR on code tasks
  • DeepSeekMath (2402.03300): GRPO algorithmic anchor

Launch example

python train_grpo_python_mbpp.py \
  --output_dir outputs/qwen3-1.7b-grpo-mbpp \
  --per_device_train_batch_size 1 \
  --gradient_accumulation_steps 16 \
  --num_generations 4 \
  --learning_rate 1e-6 \
  --max_prompt_length 512 \
  --max_completion_length 384 \
  --num_train_epochs 1 \
  --eval_strategy steps \
  --eval_steps 20 \
  --save_steps 20 \
  --logging_steps 1 \
  --bf16 True \
  --gradient_checkpointing True \
  --report_to trackio \
  --run_name grpo_qwen3_1p7b_mbpp_exec_reward \
  --project grpo-qwen3-python-code \
  --trackio_space_id AbhilekhMeda/mlintern-grpoqwen \
  --push_to_hub True \
  --hub_model_id AbhilekhMeda/qwen3-1.7b-grpo-python-mbpp
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support