POCA SoccerTwos Self-Play Agent

This repository contains a trained :contentReference[oaicite:0]{index=0} POCA self-play agent for the SoccerTwos environment.

The model was trained using:

Proximal Policy Optimization with Centralized Critic (POCA)
Self-play training
Unity ML-Agents
Multi-agent reinforcement learning

Environment

The agent was trained on the official Unity SoccerTwos environment.

You can watch or run the environment here:

:contentReference[oaicite:1]{index=1}

Training Details

Trainer

POCA (multi-agent centralized critic)

Environment

SoccerTwos
2v2 competitive soccer environment

Training Setup

Self-play enabled
Multi-agent cooperative + competitive training
Long-horizon reinforcement learning

Hyperparameters

behaviors:
  SoccerTwos:
    trainer_type: poca

    hyperparameters:
      batch_size: 2048
      buffer_size: 20480
      learning_rate: 0.0003
      beta: 0.005
      epsilon: 0.2
      lambd: 0.95
      num_epoch: 3
      learning_rate_schedule: constant

    network_settings:
      normalize: false
      hidden_units: 512
      num_layers: 2
      vis_encode_type: simple

    reward_signals:
      extrinsic:
        gamma: 0.99
        strength: 1.0

    keep_checkpoints: 5
    max_steps: 50000000
    time_horizon: 1000

Downloads last month: 37

Video Preview

Reinforcement Learning