Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -60,4 +60,4 @@ This model is nearing SOTA performance for the Freeway environment: https://www.
 The composite score at 10 million timesteps is ~32 which is only two points off SOTA of 34. It appears that with PPO even after 2BN timesteps performance can only reach 33.6 - https://huggingface.co/edbeeching/atari_2B_atari_freeway_3333
-I suspect that as with QR-DQN the SAC and TQC models can reach 34 - they just need more training to do so. I may create a QR-DQN model later to see but this environment is nearly SOTA solved and will not be the focus of many future experiments.


60
61	The composite score at 10 million timesteps is ~32 which is only two points off SOTA of 34. It appears that with PPO even after 2BN timesteps performance can only reach 33.6 - https://huggingface.co/edbeeching/atari_2B_atari_freeway_3333
62
63	+ I suspect that as with QR-DQN the SAC and TQC models can reach 34 - they just need more training to do so. I actually found that my QR-DQN model was inferior to SAC alone at 10 million timesteps although I didn't seed the model so cannot be 100% sure at this point.