pableitorr commited on
Commit
578daac
1 Parent(s): 16b6aec

Initial commit

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.mp4 filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,95 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: stable-baselines3
3
+ tags:
4
+ - CarRacing-v2
5
+ - deep-reinforcement-learning
6
+ - reinforcement-learning
7
+ - stable-baselines3
8
+ model-index:
9
+ - name: PPO
10
+ results:
11
+ - task:
12
+ type: reinforcement-learning
13
+ name: reinforcement-learning
14
+ dataset:
15
+ name: CarRacing-v2
16
+ type: CarRacing-v2
17
+ metrics:
18
+ - type: mean_reward
19
+ value: 785.88 +/- 189.75
20
+ name: mean_reward
21
+ verified: false
22
+ ---
23
+
24
+ # **PPO** Agent playing **CarRacing-v2**
25
+ This is a trained model of a **PPO** agent playing **CarRacing-v2**
26
+ using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3)
27
+ and the [RL Zoo](https://github.com/DLR-RM/rl-baselines3-zoo).
28
+
29
+ The RL Zoo is a training framework for Stable Baselines3
30
+ reinforcement learning agents,
31
+ with hyperparameter optimization and pre-trained agents included.
32
+
33
+ ## Usage (with SB3 RL Zoo)
34
+
35
+ RL Zoo: https://github.com/DLR-RM/rl-baselines3-zoo<br/>
36
+ SB3: https://github.com/DLR-RM/stable-baselines3<br/>
37
+ SB3 Contrib: https://github.com/Stable-Baselines-Team/stable-baselines3-contrib
38
+
39
+ Install the RL Zoo (with SB3 and SB3-Contrib):
40
+ ```bash
41
+ pip install rl_zoo3
42
+ ```
43
+
44
+ ```
45
+ # Download model and save it into the logs/ folder
46
+ python -m rl_zoo3.load_from_hub --algo ppo --env CarRacing-v2 -orga pableitorr -f logs/
47
+ python -m rl_zoo3.enjoy --algo ppo --env CarRacing-v2 -f logs/
48
+ ```
49
+
50
+ If you installed the RL Zoo3 via pip (`pip install rl_zoo3`), from anywhere you can do:
51
+ ```
52
+ python -m rl_zoo3.load_from_hub --algo ppo --env CarRacing-v2 -orga pableitorr -f logs/
53
+ python -m rl_zoo3.enjoy --algo ppo --env CarRacing-v2 -f logs/
54
+ ```
55
+
56
+ ## Training (with the RL Zoo)
57
+ ```
58
+ python -m rl_zoo3.train --algo ppo --env CarRacing-v2 -f logs/
59
+ # Upload the model and generate video (when possible)
60
+ python -m rl_zoo3.push_to_hub --algo ppo --env CarRacing-v2 -f logs/ -orga pableitorr
61
+ ```
62
+
63
+ ## Hyperparameters
64
+ ```python
65
+ OrderedDict([('batch_size', 128),
66
+ ('clip_range', 0.2),
67
+ ('ent_coef', 0.0),
68
+ ('env_wrapper',
69
+ [{'rl_zoo3.wrappers.FrameSkip': {'skip': 2}},
70
+ {'gymnasium.wrappers.resize_observation.ResizeObservation': {'shape': 64}},
71
+ {'gymnasium.wrappers.gray_scale_observation.GrayScaleObservation': {'keep_dim': True}}]),
72
+ ('frame_stack', 2),
73
+ ('gae_lambda', 0.95),
74
+ ('gamma', 0.99),
75
+ ('learning_rate', 'lin_1e-4'),
76
+ ('max_grad_norm', 0.5),
77
+ ('n_envs', 8),
78
+ ('n_epochs', 10),
79
+ ('n_steps', 512),
80
+ ('n_timesteps', 1000000),
81
+ ('normalize', "{'norm_obs': False, 'norm_reward': True}"),
82
+ ('policy', 'CnnPolicy'),
83
+ ('policy_kwargs',
84
+ 'dict(log_std_init=-2, ortho_init=False, activation_fn=nn.GELU, '
85
+ 'net_arch=dict(pi=[256], vf=[256]), )'),
86
+ ('sde_sample_freq', 4),
87
+ ('use_sde', True),
88
+ ('vf_coef', 0.5),
89
+ ('normalize_kwargs', {'norm_obs': False, 'norm_reward': False})])
90
+ ```
91
+
92
+ # Environment Arguments
93
+ ```python
94
+ {'render_mode': 'rgb_array'}
95
+ ```
args.yml ADDED
@@ -0,0 +1,83 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ !!python/object/apply:collections.OrderedDict
2
+ - - - algo
3
+ - ppo
4
+ - - conf_file
5
+ - null
6
+ - - device
7
+ - auto
8
+ - - env
9
+ - CarRacing-v2
10
+ - - env_kwargs
11
+ - null
12
+ - - eval_env_kwargs
13
+ - null
14
+ - - eval_episodes
15
+ - 5
16
+ - - eval_freq
17
+ - 25000
18
+ - - gym_packages
19
+ - []
20
+ - - hyperparams
21
+ - n_timesteps: 1000000
22
+ - - log_folder
23
+ - logs/
24
+ - - log_interval
25
+ - -1
26
+ - - max_total_trials
27
+ - null
28
+ - - n_eval_envs
29
+ - 1
30
+ - - n_evaluations
31
+ - null
32
+ - - n_jobs
33
+ - 1
34
+ - - n_startup_trials
35
+ - 10
36
+ - - n_timesteps
37
+ - -1
38
+ - - n_trials
39
+ - 500
40
+ - - no_optim_plots
41
+ - false
42
+ - - num_threads
43
+ - -1
44
+ - - optimization_log_path
45
+ - null
46
+ - - optimize_hyperparameters
47
+ - false
48
+ - - progress
49
+ - false
50
+ - - pruner
51
+ - median
52
+ - - sampler
53
+ - tpe
54
+ - - save_freq
55
+ - -1
56
+ - - save_replay_buffer
57
+ - false
58
+ - - seed
59
+ - 1405327760
60
+ - - storage
61
+ - null
62
+ - - study_name
63
+ - null
64
+ - - tensorboard_log
65
+ - ''
66
+ - - track
67
+ - false
68
+ - - trained_agent
69
+ - ''
70
+ - - truncate_last_trajectory
71
+ - true
72
+ - - uuid
73
+ - false
74
+ - - vec_env
75
+ - dummy
76
+ - - verbose
77
+ - 1
78
+ - - wandb_entity
79
+ - null
80
+ - - wandb_project_name
81
+ - sb3
82
+ - - wandb_tags
83
+ - []
config.yml ADDED
@@ -0,0 +1,45 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ !!python/object/apply:collections.OrderedDict
2
+ - - - batch_size
3
+ - 128
4
+ - - clip_range
5
+ - 0.2
6
+ - - ent_coef
7
+ - 0.0
8
+ - - env_wrapper
9
+ - - rl_zoo3.wrappers.FrameSkip:
10
+ skip: 2
11
+ - gymnasium.wrappers.resize_observation.ResizeObservation:
12
+ shape: 64
13
+ - gymnasium.wrappers.gray_scale_observation.GrayScaleObservation:
14
+ keep_dim: true
15
+ - - frame_stack
16
+ - 2
17
+ - - gae_lambda
18
+ - 0.95
19
+ - - gamma
20
+ - 0.99
21
+ - - learning_rate
22
+ - lin_1e-4
23
+ - - max_grad_norm
24
+ - 0.5
25
+ - - n_envs
26
+ - 8
27
+ - - n_epochs
28
+ - 10
29
+ - - n_steps
30
+ - 512
31
+ - - n_timesteps
32
+ - 1000000
33
+ - - normalize
34
+ - '{''norm_obs'': False, ''norm_reward'': True}'
35
+ - - policy
36
+ - CnnPolicy
37
+ - - policy_kwargs
38
+ - dict(log_std_init=-2, ortho_init=False, activation_fn=nn.GELU, net_arch=dict(pi=[256],
39
+ vf=[256]), )
40
+ - - sde_sample_freq
41
+ - 4
42
+ - - use_sde
43
+ - true
44
+ - - vf_coef
45
+ - 0.5
env_kwargs.yml ADDED
@@ -0,0 +1 @@
 
 
1
+ render_mode: rgb_array
ppo-CarRacing-v2.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3b55ac729c8ff928d036c96e92b8b53fc564bba77e16fee652836e4d322887f8
3
+ size 10490486
ppo-CarRacing-v2/_stable_baselines3_version ADDED
@@ -0,0 +1 @@
 
 
1
+ 2.3.2
ppo-CarRacing-v2/data ADDED
The diff for this file is too large to render. See raw diff
 
ppo-CarRacing-v2/policy.optimizer.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3bf16f650dcc26458369238a14ef9c1b5a65ff29011b395674f2cfb41d367b5b
3
+ size 6919613
ppo-CarRacing-v2/policy.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f6861fb116501e30b9e58bc528a318c395c07c9bada55d53aa6a2941154facd6
3
+ size 3461859
ppo-CarRacing-v2/pytorch_variables.pth ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fb4dde0c1ad63b7740276006a06cc491b21b407ea6c889928c223ec77ddad79f
3
+ size 864
ppo-CarRacing-v2/system_info.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ - OS: Windows-10-10.0.22631-SP0 10.0.22631
2
+ - Python: 3.10.11
3
+ - Stable-Baselines3: 2.3.2
4
+ - PyTorch: 2.4.1+cu124
5
+ - GPU Enabled: True
6
+ - Numpy: 1.26.3
7
+ - Cloudpickle: 3.0.0
8
+ - Gymnasium: 0.29.1
9
+ - OpenAI Gym: 0.26.2
results.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"mean_reward": 785.8818884, "std_reward": 189.75214840739886, "is_deterministic": true, "n_eval_episodes": 10, "eval_datetime": "2024-09-29T16:06:58.602089"}
train_eval_metrics.zip ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7590b2965dc0f1d02f7b4c5cebc4dced4d731643ed56018a9c7edc13d5b9470e
3
+ size 62998
vec_normalize.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9f7603ae1ced4e0b073635fd6ff1528293216e62369703d61170f3f2c6af54a3
3
+ size 50767