step 8_520_000 . Checkpoint from initial model taken and trained further at a lower learning rate 2nd
ad6b331
{"mean_reward": 258.1, "std_reward": 236.890459917659, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2023-01-22T14:31:52.925925"} |