MattStammers
commited on
Commit
•
91ec6a6
1
Parent(s):
48d2809
Nearly SOTA solved
Browse filesHuge progress made in this environment. SAC is so far the winner. With a bit more training would likely reach a score of 34.
README.md
CHANGED
@@ -53,4 +53,11 @@ python -m sf_examples.atari.train_atari --algo=ASAC --env=atari_freeway --train_
|
|
53 |
```
|
54 |
|
55 |
Note, you may have to adjust `--train_for_env_steps` to a suitably high number as the experiment will resume at the number of steps it concluded at.
|
56 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
53 |
```
|
54 |
|
55 |
Note, you may have to adjust `--train_for_env_steps` to a suitably high number as the experiment will resume at the number of steps it concluded at.
|
56 |
+
|
57 |
+
## SOTA Performance
|
58 |
+
|
59 |
+
This model is nearing SOTA performance for the Freeway environment: https://www.endtoend.ai/envs/gym/atari/freeway/ beating TQC and certainly DQN/PPO who both failed to converge after 10 million timesteps.
|
60 |
+
|
61 |
+
The composite score at 10 million timesteps is ~32 which is only two points off SOTA of 34. It appears that with PPO even after 2BN timesteps performance can only reach 33.6 - https://huggingface.co/edbeeching/atari_2B_atari_freeway_3333
|
62 |
+
|
63 |
+
I suspect that as with QR-DQN the SAC and TQC models can reach 34 - they just need more training to do so. I may create a QR-DQN model later to see but this environment is nearly SOTA solved and will not be the focus of many future experiments.
|