README: GRPO Internal(Colocate) Mode Execution Scripts

NOTE

Introduction

The GRPO (Group Relative Policy Optimization) training framework supports high-performance inference engines like vLLM to accelerate the sampling process. The Internal Mode allows you to deploy vLLM and perform training using the same GPU resources.

This folder contains scripts and instructions for running GRPO in Internal Mode

Training with Internal mode

--use_vllm true \
--vllm_mode colocate \
--vllm_gpu_memory_utilization [ut_ratio] \

Multi-Node Training

On each node, execute the original single-node training script, using the environment variables NNODES and NODE_RANK, and ensure consistent use of configuration parameters across all nodes.