LunarLander PPO Pro

生产级 PPO 智能体(Pretrain + Polishing 流水线),对照 beachcities/ppo-LunarLander-v3-A100-SOTA

评估结果

口径 Mean Std 说明
课程对照 (10局随机) 170.96 104.50 cert=66.47
Global Deterministic (n=100) 265.66 53.41 工作验收主指标
Best-Batch Top-10 310.02 3.94 单局最高 319

回放视频为 deterministic 策略首局,得分约 283。

用法

见仓库 unit8-lunarlander-ppo-pro/README.md

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Evaluation results

  • global_mean_deterministic on LunarLander-v3
    self-reported
    265.66 +/- 53.41