Francesco-A commited on
Commit
2a24662
1 Parent(s): b316424

Update README.md

Browse files

![thumbnail.png](https://cdn-uploads.huggingface.co/production/uploads/6493577a357b252af725bf67/wO94FZwazqho096MpER93.png)

Files changed (1) hide show
  1. README.md +78 -14
README.md CHANGED
@@ -5,31 +5,95 @@ tags:
5
  - deep-reinforcement-learning
6
  - reinforcement-learning
7
  - ML-Agents-Pyramids
 
8
  ---
9
 
10
  # **ppo** Agent playing **Pyramids**
11
  This is a trained model of a **ppo** agent playing **Pyramids**
12
  using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
13
 
14
- ## Usage (with ML-Agents)
15
- The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/
16
 
17
- We wrote a complete tutorial to learn to train your first agent using ML-Agents and publish it to the Hub:
18
- - A *short tutorial* where you teach Huggy the Dog 🐶 to fetch the stick and then play with him directly in your
19
- browser: https://huggingface.co/learn/deep-rl-course/unitbonus1/introduction
20
- - A *longer tutorial* to understand how works ML-Agents:
21
- https://huggingface.co/learn/deep-rl-course/unit5/introduction
22
 
23
  ### Resume the training
24
  ```bash
25
  mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
26
  ```
27
 
28
- ### Watch your Agent play
29
- You can watch your agent **playing directly in your browser**
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
- 1. If the environment is part of ML-Agents official environments, go to https://huggingface.co/unity
32
- 2. Step 1: Find your model_id: Francesco-A/ppo-Pyramids-v1
33
- 3. Step 2: Select your *.nn /*.onnx file
34
- 4. Click on Watch the agent play 👀
35
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  - deep-reinforcement-learning
6
  - reinforcement-learning
7
  - ML-Agents-Pyramids
8
+ license: apache-2.0
9
  ---
10
 
11
  # **ppo** Agent playing **Pyramids**
12
  This is a trained model of a **ppo** agent playing **Pyramids**
13
  using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents).
14
 
15
+ ## Watch the Agent play
16
+ You can watch the agent playing directly in your browser
17
 
18
+ Go to https://huggingface.co/spaces/unity/ML-Agents-Pyramids
19
+ Step 1: Find the model_id: Francesco-A/ppo-Pyramids-v1
20
+ Step 2: Select the .nn /.onnx file
21
+ Click on Watch the agent play
 
22
 
23
  ### Resume the training
24
  ```bash
25
  mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
26
  ```
27
 
28
+ ### Training hyperparameters
29
+ ```python
30
+ behaviors:
31
+ Pyramids:
32
+ trainer_type: ppo
33
+ hyperparameters:
34
+ batch_size: 128
35
+ buffer_size: 2048
36
+ learning_rate: 0.0003
37
+ beta: 0.01
38
+ epsilon: 0.2
39
+ lambd: 0.95
40
+ num_epoch: 3
41
+ learning_rate_schedule: linear
42
+ network_settings:
43
+ normalize: false
44
+ hidden_units: 512
45
+ num_layers: 2
46
+ vis_encode_type: simple
47
+ reward_signals:
48
+ extrinsic:
49
+ gamma: 0.99
50
+ strength: 1.0
51
+ rnd:
52
+ gamma: 0.99
53
+ strength: 0.01
54
+ network_settings:
55
+ hidden_units: 64
56
+ num_layers: 3
57
+ learning_rate: 0.0001
58
+ keep_checkpoints: 5
59
+ max_steps: 1000000
60
+ time_horizon: 128
61
+ summary_freq: 30000
62
+ ```
63
 
64
+ ## Training details
65
+ | Step | Time Elapsed | Mean Reward | Std of Reward | Status |
66
+ |---------|--------------|-------------|---------------|-----------|
67
+ | 30000 | 59.481 s | -1.000 | 0.000 | Training |
68
+ | 60000 | 118.648 s | -0.798 | 0.661 | Training |
69
+ | 90000 | 180.684 s | -0.701 | 0.808 | Training |
70
+ | 120000 | 240.734 s | -0.931 | 0.373 | Training |
71
+ | 150000 | 300.978 s | -0.851 | 0.588 | Training |
72
+ | 180000 | 360.137 s | -0.934 | 0.361 | Training |
73
+ | 210000 | 424.326 s | -1.000 | 0.000 | Training |
74
+ | 240000 | 484.774 s | -0.849 | 0.595 | Training |
75
+ | 270000 | 546.089 s | -0.377 | 1.029 | Training |
76
+ | 300000 | 614.797 s | -0.735 | 0.689 | Training |
77
+ | 330000 | 684.241 s | -0.926 | 0.405 | Training |
78
+ | 360000 | 745.790 s | -0.819 | 0.676 | Training |
79
+ | 390000 | 812.573 s | -0.715 | 0.755 | Training |
80
+ | 420000 | 877.836 s | -0.781 | 0.683 | Training |
81
+ | 450000 | 944.423 s | -0.220 | 1.114 | Training |
82
+ | 480000 | 1010.918 s | -0.484 | 0.962 | Training |
83
+ | 510000 | 1074.058 s | -0.003 | 1.162 | Training |
84
+ | 540000 | 1138.848 s | -0.021 | 1.222 | Training |
85
+ | 570000 | 1204.326 s | 0.384 | 1.231 | Training |
86
+ | 600000 | 1276.488 s | 0.690 | 1.174 | Training |
87
+ | 630000 | 1345.297 s | 0.943 | 1.058 | Training |
88
+ | 660000 | 1412.791 s | 1.014 | 1.043 | Training |
89
+ | 690000 | 1482.712 s | 0.927 | 1.054 | Training |
90
+ | 720000 | 1548.726 s | 0.900 | 1.128 | Training |
91
+ | 750000 | 1618.284 s | 1.379 | 0.701 | Training |
92
+ | 780000 | 1692.080 s | 1.567 | 0.359 | Training |
93
+ | 810000 | 1762.159 s | 1.475 | 0.567 | Training |
94
+ | 840000 | 1832.166 s | 1.438 | 0.648 | Training |
95
+ | 870000 | 1907.191 s | 1.534 | 0.536 | Training |
96
+ | 900000 | 1977.521 s | 1.552 | 0.478 | Training |
97
+ | 930000 | 2051.259 s | 1.458 | 0.633 | Training |
98
+ | 960000 | 2126.498 s | 1.545 | 0.586 | Training |
99
+ | 990000 | 2198.591 s | 1.565 | 0.591 | Training |