README: GRPO Internal(Colocate) Mode Execution Scripts
NOTE
Introduction
The GRPO (Group Relative Policy Optimization) training framework supports high-performance inference engines like vLLM to accelerate the sampling process. The Internal Mode allows you to deploy vLLM and perform training using the same GPU resources.
This folder contains scripts and instructions for running GRPO in Internal Mode
Training with Internal mode
--use_vllm true \
--vllm_mode colocate \
--vllm_gpu_memory_utilization [ut_ratio] \
Multi-Node Training
On each node, execute the original single-node training script, using the environment variables NNODES
and NODE_RANK
, and ensure consistent use of configuration parameters across all nodes.