|
--- |
|
library_name: ml-agents |
|
tags: |
|
- SnowballTarget |
|
- deep-reinforcement-learning |
|
- reinforcement-learning |
|
- ML-Agents-SnowballTarget |
|
--- |
|
|
|
# **ppo** Agent playing **SnowballTarget** |
|
This is a trained model of a **ppo** agent playing **SnowballTarget** |
|
using the [Unity ML-Agents Library](https://github.com/Unity-Technologies/ml-agents). |
|
|
|
## Usage (with ML-Agents) |
|
The Documentation: https://unity-technologies.github.io/ml-agents/ML-Agents-Toolkit-Documentation/ |
|
|
|
We wrote a complete tutorial to learn to train your first agent using ML-Agents and publish it to the Hub: |
|
- A *short tutorial* where you teach Huggy the Dog ๐ถ to fetch the stick and then play with him directly in your |
|
browser: https://huggingface.co/learn/deep-rl-course/unitbonus1/introduction |
|
- A *longer tutorial* to understand how works ML-Agents: |
|
https://huggingface.co/learn/deep-rl-course/unit5/introduction |
|
|
|
### Resume the training |
|
```bash |
|
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume |
|
``` |
|
|
|
### Watch your Agent play |
|
You can watch your agent **playing directly in your browser** |
|
|
|
1. If the environment is part of ML-Agents official environments, go to https://huggingface.co/unity |
|
2. Step 1: Find your model_id: lambdavi/ppo-SnowballTarget |
|
3. Step 2: Select your *.nn /*.onnx file |
|
4. Click on Watch the agent play ๐ |
|
|
|
### Hyperparams used: |
|
``` |
|
SnowballTarget: |
|
trainer_type: ppo |
|
hyperparameters: |
|
batch_size: 128 |
|
buffer_size: 2048 |
|
learning_rate: 0.005 |
|
beta: 0.005 |
|
epsilon: 0.2 |
|
lambd: 0.95 |
|
num_epoch: 5 |
|
shared_critic: False |
|
learning_rate_schedule: linear |
|
beta_schedule: linear |
|
epsilon_schedule: linear |
|
checkpoint_interval: 50000 |
|
network_settings: |
|
normalize: False |
|
hidden_units: 256 |
|
num_layers: 2 |
|
vis_encode_type: simple |
|
memory: None |
|
goal_conditioning_type: hyper |
|
deterministic: False |
|
reward_signals: |
|
extrinsic: |
|
gamma: 0.99 |
|
strength: 1.0 |
|
network_settings: |
|
normalize: False |
|
hidden_units: 128 |
|
num_layers: 2 |
|
vis_encode_type: simple |
|
memory: None |
|
goal_conditioning_type: hyper |
|
deterministic: False |
|
init_path: None |
|
keep_checkpoints: 10 |
|
even_checkpoints: False |
|
max_steps: 500000 |
|
time_horizon: 64 |
|
summary_freq: 10000 |
|
threaded: True |
|
self_play: None |
|
behavioral_cloning: None |
|
``` |