Retrained to 10M steps, with higher play_against_latest_model_ratio (0.75 instead of 0.25) this helped the model to learn to play defense better
f5c5d35
verified
Statos6
commited on