File size: 5,213 Bytes
42df274
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3e87e2a
42df274
 
 
 
 
 
 
 
 
 
 
 
 
 
3e87e2a
42df274
 
 
3e87e2a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
42df274
 
 
 
3e87e2a
42df274
 
 
 
 
 
 
 
 
 
3e87e2a
42df274
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
---
library_name: sample-factory
tags:
- deep-reinforcement-learning
- reinforcement-learning
- sample-factory
model-index:
- name: APPO
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: atari_bowling
      type: atari_bowling
    metrics:
    - type: mean_reward
      value: 46.40 +/- 5.28
      name: mean_reward
      verified: false
---

A(n) **APPO** model trained on the **atari_bowling** environment.

This model was trained using Sample-Factory 2.0: https://github.com/alex-petrenko/sample-factory.
Documentation for how to use Sample-Factory can be found at https://www.samplefactory.dev/


## Downloading the model

After installing Sample-Factory, download the model with:
```
python -m sample_factory.huggingface.load_from_hub -r MattStammers/APPO-atari_bowling
```

    
## About the Model

This model as with all the others in the benchmarks was trained initially asynchronously un-seeded to 10 million steps for the purposes of setting a sample factory async baseline for this model on this environment but only 3/57 made it. 

The aim is to reach state-of-the-art (SOTA) performance on each atari environment. I will flag the models with SOTA when they reach at or near these levels. 

The hyperparameters used in the model are the ones I have pushed to my fork of sample-factory: https://github.com/MattStammers/sample-factory. Given that https://huggingface.co/edbeeching has kindly shared his.
I saved time and energy by using many of his tuned hyperparameters to maximise performance. However, he used 2 billion training steps. I have started as explained above at 10 million then moved to 100m to see how performance goes:
```
hyperparameters =  {
  "device": "gpu",
  "seed": 1234,
  "num_policies": 2,
  "async_rl": true,
  "serial_mode": false,
  "batched_sampling": true,
  "num_batches_to_accumulate": 2,
  "worker_num_splits": 1,
  "policy_workers_per_policy": 1,
  "max_policy_lag": 1000,
  "num_workers": 16,
  "num_envs_per_worker": 2,
  "batch_size": 1024,
  "num_batches_per_epoch": 8,
  "num_epochs": 4,
  "rollout": 128,
  "recurrence": 1,
  "shuffle_minibatches": false,
  "gamma": 0.99,
  "reward_scale": 1.0,
  "reward_clip": 1000.0,
  "value_bootstrap": false,
  "normalize_returns": true,
  "exploration_loss_coeff": 0.0004677351413,
  "value_loss_coeff": 0.5,
  "kl_loss_coeff": 0.0,
  "exploration_loss": "entropy",
  "gae_lambda": 0.95,
  "ppo_clip_ratio": 0.1,
  "ppo_clip_value": 1.0,
  "with_vtrace": false,
  "vtrace_rho": 1.0,
  "vtrace_c": 1.0,
  "optimizer": "adam",
  "adam_eps": 1e-05,
  "adam_beta1": 0.9,
  "adam_beta2": 0.999,
  "max_grad_norm": 0.0,
  "learning_rate": 0.0003033891184,
  "lr_schedule": "linear_decay",
  "lr_schedule_kl_threshold": 0.008,
  "lr_adaptive_min": 1e-06,
  "lr_adaptive_max": 0.01,
  "obs_subtract_mean": 0.0,
  "obs_scale": 255.0,
  "normalize_input": true,
  "normalize_input_keys": [
    "obs"
  ],
  "decorrelate_experience_max_seconds": 0,
  "decorrelate_envs_on_one_worker": true,
  "actor_worker_gpus": [],
  "set_workers_cpu_affinity": true,
  "force_envs_single_thread": false,
  "default_niceness": 0,
  "log_to_file": true,
  "experiment_summaries_interval": 3,
  "flush_summaries_interval": 30,
  "stats_avg": 100,
  "summaries_use_frameskip": true,
  "heartbeat_interval": 10,
  "heartbeat_reporting_interval": 60,
  "train_for_env_steps": 100000000,
  "train_for_seconds": 10000000000,
  "save_every_sec": 120,
  "keep_checkpoints": 2,
  "load_checkpoint_kind": "latest",
  "save_milestones_sec": 1200,
  "save_best_every_sec": 5,
  "save_best_metric": "reward",
  "save_best_after": 100000,
  "benchmark": false,
  "encoder_mlp_layers": [
    512,
    512
  ],
  "encoder_conv_architecture": "convnet_atari",
  "encoder_conv_mlp_layers": [
    512
  ],
  "use_rnn": false,
  "rnn_size": 512,
  "rnn_type": "gru",
  "rnn_num_layers": 1,
  "decoder_mlp_layers": [],
  "nonlinearity": "relu",
  "policy_initialization": "orthogonal",
  "policy_init_gain": 1.0,
  "actor_critic_share_weights": true,
  "adaptive_stddev": false,
  "continuous_tanh_scale": 0.0,
  "initial_stddev": 1.0,
  "use_env_info_cache": false,
  "env_gpu_actions": false,
  "env_gpu_observations": true,
  "env_frameskip": 4,
  "env_framestack": 4,
  }

  ```


    
## Using the model

To run the model after download, use the `enjoy` script corresponding to this environment:
```
python -m sf_examples.atari.enjoy_atari --algo=APPO --env=atari_bowling --train_dir=./train_dir --experiment=APPO-atari_bowling
```


You can also upload models to the Hugging Face Hub using the same script with the `--push_to_hub` flag.
See https://www.samplefactory.dev/10-huggingface/huggingface/ for more details
    
## Training with this model

To continue training with this model, use the `train` script corresponding to this environment:
```
python -m sf_examples.atari.train_atari --algo=APPO --env=atari_bowling --train_dir=./train_dir --experiment=APPO-atari_bowling --restart_behavior=resume --train_for_env_steps=10000000000
```

Note, you may have to adjust `--train_for_env_steps` to a suitably high number as the experiment will resume at the number of steps it concluded at.