MattStammers commited on
Commit
485fe66
1 Parent(s): 24aea20

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -60,4 +60,4 @@ This model is nearing SOTA performance for the Freeway environment: https://www.
60
 
61
  The composite score at 10 million timesteps is ~32 which is only two points off SOTA of 34. It appears that with PPO even after 2BN timesteps performance can only reach 33.6 - https://huggingface.co/edbeeching/atari_2B_atari_freeway_3333
62
 
63
- I suspect that as with QR-DQN the SAC and TQC models can reach 34 - they just need more training to do so. I actually found that my QR-DQN model was inferior to SAC alone at 10 million timesteps although I didn't seed the model so cannot be 100% sure at this point.
 
60
 
61
  The composite score at 10 million timesteps is ~32 which is only two points off SOTA of 34. It appears that with PPO even after 2BN timesteps performance can only reach 33.6 - https://huggingface.co/edbeeching/atari_2B_atari_freeway_3333
62
 
63
+ I suspect that as with QR-DQN the SAC and TQC models can reach 34 - they just need more training to do so.