Edit model card

ppo Agent playing Pyramids

This is a trained model of a ppo agent playing Pyramids using the Unity ML-Agents Library.

Watch the Agent play

You can watch the agent playing directly in your browser

Go to https://huggingface.co/spaces/unity/ML-Agents-Pyramids Step 1: Find the model_id: Francesco-A/ppo-Pyramids-v1 Step 2: Select the .nn /.onnx file Click on Watch the agent play

Resume the training

mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume

Training hyperparameters

behaviors:
  Pyramids:
    trainer_type: ppo
    hyperparameters:
      batch_size: 128
      buffer_size: 2048
      learning_rate: 0.0003
      beta: 0.01
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: linear
    network_settings:
      normalize: false
      hidden_units: 512
      num_layers: 2
      vis_encode_type: simple
    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0
      rnd:
        gamma: 0.99
        strength: 0.01
        network_settings:
          hidden_units: 64
          num_layers: 3
        learning_rate: 0.0001
    keep_checkpoints: 5
    max_steps: 1000000
    time_horizon: 128
    summary_freq: 30000

Training details

Step Time Elapsed Mean Reward Std of Reward Status
30000 59.481 s -1.000 0.000 Training
60000 118.648 s -0.798 0.661 Training
90000 180.684 s -0.701 0.808 Training
120000 240.734 s -0.931 0.373 Training
150000 300.978 s -0.851 0.588 Training
180000 360.137 s -0.934 0.361 Training
210000 424.326 s -1.000 0.000 Training
240000 484.774 s -0.849 0.595 Training
270000 546.089 s -0.377 1.029 Training
300000 614.797 s -0.735 0.689 Training
330000 684.241 s -0.926 0.405 Training
360000 745.790 s -0.819 0.676 Training
390000 812.573 s -0.715 0.755 Training
420000 877.836 s -0.781 0.683 Training
450000 944.423 s -0.220 1.114 Training
480000 1010.918 s -0.484 0.962 Training
510000 1074.058 s -0.003 1.162 Training
540000 1138.848 s -0.021 1.222 Training
570000 1204.326 s 0.384 1.231 Training
600000 1276.488 s 0.690 1.174 Training
630000 1345.297 s 0.943 1.058 Training
660000 1412.791 s 1.014 1.043 Training
690000 1482.712 s 0.927 1.054 Training
720000 1548.726 s 0.900 1.128 Training
750000 1618.284 s 1.379 0.701 Training
780000 1692.080 s 1.567 0.359 Training
810000 1762.159 s 1.475 0.567 Training
840000 1832.166 s 1.438 0.648 Training
870000 1907.191 s 1.534 0.536 Training
900000 1977.521 s 1.552 0.478 Training
930000 2051.259 s 1.458 0.633 Training
960000 2126.498 s 1.545 0.586 Training
990000 2198.591 s 1.565 0.591 Training
Downloads last month
4
Video Preview
loading