| verl x Ascend |
| =================================== |
|
|
| Last updated: 08/15/2025. |
|
|
| 我们在 verl 上增加对华为昇腾设备的支持。 |
|
|
| 硬件支持 |
| ----------------------------------- |
|
|
| Atlas 200T A2 Box16 |
|
|
| Atlas 900 A2 PODc |
|
|
| Atlas 800T A3 |
|
|
|
|
| 安装 |
| ----------------------------------- |
|
|
| 基础环境准备 |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| +-----------+-------------+ |
| | software | version | |
| +-----------+-------------+ |
| | Python | == 3.10 | |
| +-----------+-------------+ |
| | CANN | == 8.1.RC1 | |
| +-----------+-------------+ |
| | torch | == 2.5.1 | |
| +-----------+-------------+ |
| | torch_npu | == 2.5.1 | |
| +-----------+-------------+ |
|
|
| 基础环境准备请参照这份 `文档 <https://gitee.com/ascend/pytorch>`_ 。 |
|
|
| vllm & vllm-ascend |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| 为了能够在 verl 中正常使用 vllm,需使用以下命令编译安装 vllm 和 vllm-ascend。请注意根据机器类型区分安装方式。 |
|
|
| .. code-block:: bash |
| |
| # vllm |
| git clone -b v0.7.3 --depth 1 https://github.com/vllm-project/vllm.git |
| cd vllm |
| pip install -r requirements-build.txt |
|
|
| # for Atlas 200T A2 Box16 |
| VLLM_TARGET_DEVICE=empty pip install -e . --extra-index https://download.pytorch.org/whl/cpu/ |
| |
| # for Atlas 900 A2 PODc |
| VLLM_TARGET_DEVICE=empty pip install -e . |
|
|
| .. code-block:: bash |
| |
| # vllm-ascend |
| git clone -b v0.7.3.post1 --depth 1 https://github.com/vllm-project/vllm-ascend.git |
| cd vllm-ascend |
| export COMPILE_CUSTOM_KERNELS=1 |
| python setup.py install |
|
|
| 安装verl |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| .. code-block:: bash |
|
|
| git clone https://github.com/volcengine/verl.git |
| cd verl |
| pip install -r requirements-npu.txt |
| pip install -e . |
|
|
| 其他三方库说明 |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
|
|
| +--------------+---------------+ |
| | software | description | |
| +--------------+---------------+ |
| | transformers | v4.52.4 | |
| +--------------+---------------+ |
| | flash_attn | not supported | |
| +--------------+---------------+ |
| | liger-kernel | not supported | |
| +--------------+---------------+ |
|
|
| 1. 支持通过 transformers 使能 --flash_attention_2, transformers 需等于 4.52.4版本。 |
| 2. 不支持通过 flash_attn 使能 flash attention 加速。 |
| 3. 不支持 liger-kernel 使能。 |
| 4. 针对 x86 服务器,需要安装 cpu 版本的 torchvision。 |
|
|
| .. code-block:: bash |
|
|
| pip install torchvision==0.20.1+cpu --index-url https://download.pytorch.org/whl/cpu |
|
|
|
|
| 快速开始 |
| ----------------------------------- |
| 正式使用前,建议您通过对Qwen2.5-0.5B GRPO的训练尝试以检验环境准备和安装的正确性。 |
|
|
| 1.下载数据集并将数据集预处理为parquet格式,以便包含计算RL奖励所需的必要字段 |
|
|
| .. code-block:: bash |
|
|
| python3 examples/data_preprocess/gsm8k.py --local_save_dir ~/data/gsm8k |
|
|
| 2.执行训练 |
|
|
| .. code-block:: bash |
|
|
| set -x |
|
|
| export VLLM_ATTENTION_BACKEND=XFORMERS |
|
|
| python3 -m verl.trainer.main_ppo \ |
| algorithm.adv_estimator=grpo \ |
| data.train_files=$HOME/data/gsm8k/train.parquet \ |
| data.val_files=$HOME/data/gsm8k/test.parquet \ |
| data.train_batch_size=128 \ |
| data.max_prompt_length=512 \ |
| data.max_response_length=128 \ |
| data.filter_overlong_prompts=True \ |
| data.truncation='error' \ |
| actor_rollout_ref.model.path=Qwen/Qwen2.5-0.5B-Instruct \ |
| actor_rollout_ref.actor.optim.lr=5e-7 \ |
| actor_rollout_ref.model.use_remove_padding=False \ |
| actor_rollout_ref.actor.entropy_coeff=0.001 \ |
| actor_rollout_ref.actor.ppo_mini_batch_size=64 \ |
| actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=20 \ |
| actor_rollout_ref.actor.use_kl_loss=True \ |
| actor_rollout_ref.actor.kl_loss_coef=0.001 \ |
| actor_rollout_ref.actor.kl_loss_type=low_var_kl \ |
| actor_rollout_ref.model.enable_gradient_checkpointing=True \ |
| actor_rollout_ref.actor.fsdp_config.param_offload=False \ |
| actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \ |
| actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=40 \ |
| actor_rollout_ref.rollout.enable_chunked_prefill=False \ |
| actor_rollout_ref.rollout.tensor_model_parallel_size=2 \ |
| actor_rollout_ref.rollout.name=vllm \ |
| actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \ |
| actor_rollout_ref.rollout.n=5 \ |
| actor_rollout_ref.ref.log_prob_micro_batch_size_per_gpu=40 \ |
| actor_rollout_ref.ref.fsdp_config.param_offload=True \ |
| algorithm.kl_ctrl.kl_coef=0.001 \ |
| trainer.critic_warmup=0 \ |
| trainer.logger=console \ |
| trainer.project_name='verl_grpo_example_gsm8k' \ |
| trainer.experiment_name='qwen2_7b_function_rm' \ |
| trainer.n_gpus_per_node=8 \ |
| trainer.nnodes=1 \ |
| trainer.save_freq=-1 \ |
| trainer.test_freq=5 \ |
| trainer.total_epochs=1 \ |
| trainer.device=npu $@ |
|
|
| (可选) 设置MindSpeed训练后端指导 |
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ |
| 1. 参考 `MindSpeed README <https://gitee.com/ascend/MindSpeed>`_ 说明安装 MindSpeed 加速库。 |
|
|
| 2. 使能 verl worker 模型 ``strategy`` 配置为 ``megatron`` ,例如 ``actor_rollout_ref.actor.strategy=megatron``。 |
|
|
| 3. MindSpeed 自定义入参可通过 ``override_transformer_config`` 参数传入,例如对 actor 模型开启 FA 特性可使用 ``+actor_rollout_ref.actor.megatron.override_transformer_config.use_flash_attn=True``。 |
|
|
| 4. 更多特性信息可参考 `MindSpeed+verl 文档 <https://gitee.com/ascend/MindSpeed/blob/master/docs/user-guide/verl.md>`_ 。 |
|
|
| 支持现状 |
| ----------------------------------- |
|
|
| **表1** RL类算法 |
|
|
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | algorithm | model | actor.strategy | rollout.name | hardware | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | GRPO | Qwen2.5-7B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | GRPO | Qwen2.5-32B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | GRPO | Qwen2.5-VL-3B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | GRPO | Qwen2.5-VL-7B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | GRPO | Qwen2.5-VL-32B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | GRPO | Qwen3-8B | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | GRPO | Qwen3-32B | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | DAPO | Qwen2.5-7B-instruct | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | DAPO | Qwen2.5-32B | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | DAPO | Qwen3-8B-base | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | DAPO | Qwen3-14B-base | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | DAPO | Qwen3-30B-A3B-base | FSDP | vllm-ascend | Atlas 200T A2 Box16 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | DAPO | Qwen3-30B-A3B | megatron | vllm-ascend | Atlas 800T A3 | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
| | PPO | Qwen3-8B | FSDP | vllm-ascend | Atlas 900 A2 PODc | |
| +-----------+-------------------------+-------------------+-------------------+--------------------------+ |
|
|
| **表2** SFT类算法 |
|
|
| +-----------+-------------------------+-------------------+----------------------+ |
| | algorithm | model | actor.strategy | hardware | |
| +-----------+-------------------------+-------------------+----------------------+ |
| | SFT-PEFT | Qwen3-8B | FSDP | Atlas 900 A2 PODc | |
| +-----------+-------------------------+-------------------+----------------------+ |
| | ReTool-SFT| Qwen2.5-7B-instruct | FSDP | Atlas 900 A2 PODc | |
| +-----------+-------------------------+-------------------+----------------------+ |
|
|
|
|
|
|
| 计划 |
| ----------------------------------- |
|
|
| 查看 `roadmap <https://github.com/volcengine/verl/discussions/2171>`_ 获取更多特性的支持进度。 |
|
|
|
|
|
|
| 声明 |
| ----------------------------------- |
| verl中提供的ascend支持代码皆为参考样例,如在生产环境中使用请通过官方正式途径沟通,谢谢。 |
|
|