| SkyPilot Examples |
| ================= |
|
|
| Last updated: 09/04/2025. |
|
|
| This guide provides examples of running VERL reinforcement learning training on Kubernetes clusters or cloud platforms with GPU nodes using `SkyPilot <https://github.com/skypilot-org/skypilot>`_. |
|
|
| Installation and Configuration |
| ------------------------------- |
|
|
| Step 1: Install SkyPilot |
| ~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
| Choose the installation based on your target platform: |
|
|
| .. code-block:: bash |
|
|
| # For Kubernetes only |
| pip install "skypilot[kubernetes]" |
| |
| # For AWS |
| pip install "skypilot[aws]" |
| |
| # For Google Cloud Platform |
| pip install "skypilot[gcp]" |
| |
| # For Azure |
| pip install "skypilot[azure]" |
| |
| # For multiple platforms |
| pip install "skypilot[kubernetes,aws,gcp,azure]" |
|
|
| Step 2: Configure Your Platform |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
| See https://docs.skypilot.co/en/latest/getting-started/installation.html |
|
|
| Step 3: Set Up Environment Variables |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
| Export necessary API keys for experiment tracking: |
|
|
| .. code-block:: bash |
|
|
| # For Weights & Biases tracking |
| export WANDB_API_KEY="your-wandb-api-key" |
| |
| # For HuggingFace gated models (if needed) |
| export HF_TOKEN="your-huggingface-token" |
|
|
| Examples |
| -------- |
|
|
| All example configurations are available in the `examples/skypilot/ <https://github.com/volcengine/verl/tree/main/examples/skypilot>`_ directory on GitHub. See the `README <https://github.com/volcengine/verl/blob/main/examples/skypilot/README.md>`_ for additional details. |
|
|
| PPO Training |
| ~~~~~~~~~~~~ |
|
|
| .. code-block:: bash |
|
|
| sky launch -c verl-ppo verl-ppo.yaml --secret WANDB_API_KEY -y |
|
|
| Runs PPO training on GSM8K dataset using Qwen2.5-0.5B-Instruct model across 2 nodes with H100 GPUs. Based on examples in ``examples/ppo_trainer/``. |
|
|
| `View verl-ppo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-ppo.yaml>`_ |
|
|
| GRPO Training |
| ~~~~~~~~~~~~~ |
|
|
| .. code-block:: bash |
|
|
| sky launch -c verl-grpo verl-grpo.yaml --secret WANDB_API_KEY -y |
|
|
| Runs GRPO (Group Relative Policy Optimization) training on MATH dataset using Qwen2.5-7B-Instruct model. Memory-optimized configuration for 2 nodes. Based on examples in ``examples/grpo_trainer/``. |
|
|
| `View verl-grpo.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-grpo.yaml>`_ |
|
|
| Multi-turn Tool Usage Training |
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
|
|
| .. code-block:: bash |
|
|
| sky launch -c verl-multiturn verl-multiturn-tools.yaml \ |
| --secret WANDB_API_KEY --secret HF_TOKEN -y |
|
|
| Single-node training with 8xH100 GPUs for multi-turn tool usage with Qwen2.5-3B-Instruct. Includes tool and interaction configurations for GSM8K. Based on examples in ``examples/sglang_multiturn/`` but uses vLLM instead of sglang. |
|
|
| `View verl-multiturn-tools.yaml on GitHub <https://github.com/volcengine/verl/blob/main/examples/skypilot/verl-multiturn-tools.yaml>`_ |
|
|
| Configuration |
| ------------- |
|
|
| The example YAML files are pre-configured with: |
|
|
| - **Infrastructure**: Kubernetes clusters (``infra: k8s``) - can be changed to ``infra: aws`` or ``infra: gcp``, etc. |
| - **Docker Image**: VERL's official Docker image with CUDA 12.6 support |
| - **Setup**: Automatically clones and installs VERL from source |
| - **Datasets**: Downloads required datasets during setup phase |
| - **Ray Cluster**: Configures distributed training across nodes |
| - **Logging**: Supports Weights & Biases via ``--secret WANDB_API_KEY`` |
| - **Models**: Supports gated HuggingFace models via ``--secret HF_TOKEN`` |
|
|
| Launch Command Options |
| ---------------------- |
|
|
| - ``-c <name>``: Cluster name for managing the job |
| - ``--secret KEY``: Pass secrets for API keys (can be used multiple times) |
| - ``-y``: Skip confirmation prompt |
|
|
| Monitoring Your Jobs |
| -------------------- |
|
|
| Check Cluster Status |
| ~~~~~~~~~~~~~~~~~~~~ |
|
|
| .. code-block:: bash |
|
|
| sky status |
|
|
| View Logs |
| ~~~~~~~~~ |
|
|
| .. code-block:: bash |
|
|
| sky logs verl-ppo # View logs for the PPO job |
|
|
| SSH into Head Node |
| ~~~~~~~~~~~~~~~~~~ |
|
|
| .. code-block:: bash |
|
|
| ssh verl-ppo |
|
|
| Access Ray Dashboard |
| ~~~~~~~~~~~~~~~~~~~~ |
|
|
| .. code-block:: bash |
|
|
| sky status --endpoint 8265 verl-ppo # Get dashboard URL |
|
|
| Stop a Cluster |
| ~~~~~~~~~~~~~~ |
|
|
| .. code-block:: bash |
|
|
| sky down verl-ppo |
|
|