Roberto commited on
Commit
1746469
1 Parent(s): 6a0df06

Initial commit

Browse files
README.md CHANGED
@@ -6,7 +6,7 @@ tags:
6
  - reinforcement-learning
7
  - stable-baselines3
8
  model-index:
9
- - name: PPO
10
  results:
11
  - task:
12
  type: reinforcement-learning
@@ -16,13 +16,13 @@ model-index:
16
  type: SpaceInvadersNoFrameskip-v4
17
  metrics:
18
  - type: mean_reward
19
- value: 808.50 +/- 361.36
20
  name: mean_reward
21
  verified: false
22
  ---
23
 
24
- # **PPO** Agent playing **SpaceInvadersNoFrameskip-v4**
25
- This is a trained model of a **PPO** agent playing **SpaceInvadersNoFrameskip-v4**
26
  using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27
  and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
28
 
@@ -38,37 +38,39 @@ SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
38
 
39
  ```
40
  # Download model and save it into the logs/ folder
41
- python -m rl_zoo3.load_from_hub --algo ppo --env SpaceInvadersNoFrameskip-v4 -orga Roberto -f logs/
42
- python enjoy.py --algo ppo --env SpaceInvadersNoFrameskip-v4 -f logs/
43
  ```
44
 
45
  If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
46
  ```
47
- python -m rl_zoo3.load_from_hub --algo ppo --env SpaceInvadersNoFrameskip-v4 -orga Roberto -f logs/
48
- rl_zoo3 enjoy --algo ppo --env SpaceInvadersNoFrameskip-v4 -f logs/
49
  ```
50
 
51
  ## Training (with the RL Zoo)
52
  ```
53
- python train.py --algo ppo --env SpaceInvadersNoFrameskip-v4 -f logs/
54
  # Upload the model and generate video (when possible)
55
- python -m rl_zoo3.push_to_hub --algo ppo --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga Roberto
56
  ```
57
 
58
  ## Hyperparameters
59
  ```python
60
- OrderedDict([('batch_size', 256),
61
- ('clip_range', 'lin_0.1'),
62
- ('ent_coef', 0.01),
63
  ('env_wrapper',
64
  ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
 
 
65
  ('frame_stack', 4),
66
- ('learning_rate', 'lin_2.5e-4'),
67
- ('n_envs', 8),
68
- ('n_epochs', 4),
69
- ('n_steps', 128),
70
  ('n_timesteps', 10000000.0),
 
71
  ('policy', 'CnnPolicy'),
72
- ('vf_coef', 0.5),
 
73
  ('normalize', False)])
74
  ```
 
6
  - reinforcement-learning
7
  - stable-baselines3
8
  model-index:
9
+ - name: DQN
10
  results:
11
  - task:
12
  type: reinforcement-learning
 
16
  type: SpaceInvadersNoFrameskip-v4
17
  metrics:
18
  - type: mean_reward
19
+ value: 892.50 +/- 340.74
20
  name: mean_reward
21
  verified: false
22
  ---
23
 
24
+ # **DQN** Agent playing **SpaceInvadersNoFrameskip-v4**
25
+ This is a trained model of a **DQN** agent playing **SpaceInvadersNoFrameskip-v4**
26
  using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27
  and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
28
 
 
38
 
39
  ```
40
  # Download model and save it into the logs/ folder
41
+ python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga Roberto -f logs/
42
+ python enjoy.py --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
43
  ```
44
 
45
  If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
46
  ```
47
+ python -m rl_zoo3.load_from_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -orga Roberto -f logs/
48
+ rl_zoo3 enjoy --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
49
  ```
50
 
51
  ## Training (with the RL Zoo)
52
  ```
53
+ python train.py --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/
54
  # Upload the model and generate video (when possible)
55
+ python -m rl_zoo3.push_to_hub --algo dqn --env SpaceInvadersNoFrameskip-v4 -f logs/ -orga Roberto
56
  ```
57
 
58
  ## Hyperparameters
59
  ```python
60
+ OrderedDict([('batch_size', 32),
61
+ ('buffer_size', 100000),
 
62
  ('env_wrapper',
63
  ['stable_baselines3.common.atari_wrappers.AtariWrapper']),
64
+ ('exploration_final_eps', 0.01),
65
+ ('exploration_fraction', 0.1),
66
  ('frame_stack', 4),
67
+ ('gradient_steps', 1),
68
+ ('learning_rate', 0.0001),
69
+ ('learning_starts', 100000),
 
70
  ('n_timesteps', 10000000.0),
71
+ ('optimize_memory_usage', False),
72
  ('policy', 'CnnPolicy'),
73
+ ('target_update_interval', 1000),
74
+ ('train_freq', 4),
75
  ('normalize', False)])
76
  ```
args.yml CHANGED
@@ -1,6 +1,6 @@
1
  !!python/object/apply:collections.OrderedDict
2
  - - - algo
3
- - ppo
4
  - - conf_file
5
  - null
6
  - - device
@@ -54,7 +54,7 @@
54
  - - save_replay_buffer
55
  - false
56
  - - seed
57
- - 1410119825
58
  - - storage
59
  - null
60
  - - study_name
 
1
  !!python/object/apply:collections.OrderedDict
2
  - - - algo
3
+ - dqn
4
  - - conf_file
5
  - null
6
  - - device
 
54
  - - save_replay_buffer
55
  - false
56
  - - seed
57
+ - 1190980102
58
  - - storage
59
  - null
60
  - - study_name
config.yml CHANGED
@@ -1,25 +1,29 @@
1
  !!python/object/apply:collections.OrderedDict
2
  - - - batch_size
3
- - 256
4
- - - clip_range
5
- - lin_0.1
6
- - - ent_coef
7
- - 0.01
8
  - - env_wrapper
9
  - - stable_baselines3.common.atari_wrappers.AtariWrapper
 
 
 
 
10
  - - frame_stack
11
  - 4
 
 
12
  - - learning_rate
13
- - lin_2.5e-4
14
- - - n_envs
15
- - 8
16
- - - n_epochs
17
- - 4
18
- - - n_steps
19
- - 128
20
  - - n_timesteps
21
  - 10000000.0
 
 
22
  - - policy
23
  - CnnPolicy
24
- - - vf_coef
25
- - 0.5
 
 
 
1
  !!python/object/apply:collections.OrderedDict
2
  - - - batch_size
3
+ - 32
4
+ - - buffer_size
5
+ - 100000
 
 
6
  - - env_wrapper
7
  - - stable_baselines3.common.atari_wrappers.AtariWrapper
8
+ - - exploration_final_eps
9
+ - 0.01
10
+ - - exploration_fraction
11
+ - 0.1
12
  - - frame_stack
13
  - 4
14
+ - - gradient_steps
15
+ - 1
16
  - - learning_rate
17
+ - 0.0001
18
+ - - learning_starts
19
+ - 100000
 
 
 
 
20
  - - n_timesteps
21
  - 10000000.0
22
+ - - optimize_memory_usage
23
+ - false
24
  - - policy
25
  - CnnPolicy
26
+ - - target_update_interval
27
+ - 1000
28
+ - - train_freq
29
+ - 4
dqn-SpaceInvadersNoFrameskip-v4.zip CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:90891d61dc5e16f246814babd3eebe4c53e51ccfeed146e5b468e23b817fbc65
3
- size 13719727
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7799bb75a0957ba4816da88b2f04eaaff09bd2dab5407643b47e94229530b500
3
+ size 27224904
dqn-SpaceInvadersNoFrameskip-v4/data CHANGED
The diff for this file is too large to render. See raw diff
 
dqn-SpaceInvadersNoFrameskip-v4/policy.optimizer.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:f1e067afe9912f3dd1b7925918b8cbe439229f6008e572c9c7e431ae731419f1
3
- size 687
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad841495214761f229b0f45b3914cbe83dea6bfa4a5f0f563197fcb8df95214f
3
+ size 13505419
dqn-SpaceInvadersNoFrameskip-v4/policy.pth CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:7f823934be8aca8ed4e0ef45a70013d70e4fe120c6c3b6c1d673577629bb07ad
3
- size 13504937
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fdeaf40f2bd414d341781ddaf39e1a4a94ce9e09dcb93d0e475f17c3d90385d3
3
+ size 13504745
dqn-SpaceInvadersNoFrameskip-v4/system_info.txt CHANGED
@@ -1,7 +1,7 @@
1
- OS: Linux-5.10.133+-x86_64-with-glibc2.27 #1 SMP Fri Aug 26 08:44:51 UTC 2022
2
- Python: 3.8.16
3
  Stable-Baselines3: 1.6.2
4
- PyTorch: 1.13.0+cu116
5
- GPU Enabled: True
6
- Numpy: 1.21.6
7
  Gym: 0.21.0
 
1
+ OS: Linux-5.15.0-56-generic-x86_64-with-glibc2.35 #62-Ubuntu SMP Tue Nov 22 19:54:14 UTC 2022
2
+ Python: 3.9.15
3
  Stable-Baselines3: 1.6.2
4
+ PyTorch: 1.13.1+cu117
5
+ GPU Enabled: False
6
+ Numpy: 1.24.0
7
  Gym: 0.21.0
replay.mp4 CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:85bfc6ed9433811788324a541ce8b129a920cb965bae7796fe02c1a772a65659
3
- size 203960
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:40a9f1812bd15d2fc97d7f87743d9093e2f64c994cea01c5945212b96d119541
3
+ size 195972
results.json CHANGED
@@ -1 +1 @@
1
- {"mean_reward": 808.5, "std_reward": 361.35889362239305, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2022-12-21T09:29:07.019988"}
 
1
+ {"mean_reward": 892.5, "std_reward": 340.74367198819704, "is_deterministic": false, "n_eval_episodes": 10, "eval_datetime": "2022-12-24T17:16:22.255684"}
train_eval_metrics.zip CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:271a39b9269e2be53529507d07f422263036bf2d60154c534c274b85d139e3f2
3
- size 314162
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4b41fa0a29e21fab4b06e0f9a07bc5c3a2d3887a8976ec789a81a0dcf00730b2
3
+ size 135924