KraTUZen's picture
Update README.md
f5cf129 verified
metadata
library_name: ml-agents
tags:
  - Pyramids
  - deep-reinforcement-learning
  - reinforcement-learning
  - ML-Agents-Pyramids
  - PPO
  - Unity
model-index:
  - name: PPO-PyramidsTraining6
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: Pyramids
          type: Unity-MLAgents-Env
        metrics:
          - type: mean_reward
            value: 1.381
            name: mean_reward
            verified: false
          - type: std_reward
            value: 0
            name: std_reward
            verified: false

๐Ÿ›๏ธ PPO Agent on Pyramids

This repository contains a trained Proximal Policy Optimization (PPO) agent that plays the Pyramids environment using the Unity ML-Agents Library.


๐Ÿ“Š Model Card

Model Name: ppo-PyramidsTraining
Environment: Pyramids (Unity ML-Agents)
Algorithm: PPO (Proximal Policy Optimization)
Performance Metric:

  • Achieves stable performance in navigating and solving pyramid-based tasks
  • Demonstrates convergence to an effective policy

๐Ÿš€ Usage (with ML-Agents)

Documentation: ML-Agents Toolkit Docs

Resume Training

mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume

Load and Run

# Example: loading the trained PPO model
# (requires Unity ML-Agents setup)
model_id = "KraTUZen/ppo-PyramidsTraining"
# Select your .nn or .onnx file from the repo

๐Ÿง  Notes

  • The agent is trained using PPO, a robust on-policy algorithm widely used in Unity ML-Agents.
  • The environment involves pyramid navigation and puzzle-solving, requiring precision and strategy.
  • The trained model is stored as .nn or .onnx files for direct Unity integration.

๐Ÿ“‚ Repository Structure

  • Pyramids.nn / Pyramids.onnx โ†’ Trained PPO policy
  • README.md โ†’ Documentation and usage guide

โœ… Results

  • The agent learns to navigate pyramid structures and solve tasks efficiently.
  • Demonstrates stable training and effective policy convergence using PPO.

๐Ÿ”Ž Environment Overview

  • Observation Space: Continuous (agent position, pyramid state, environment features)
  • Action Space: Continuous (movement, interaction)
  • Objective: Solve pyramid-based tasks and maximize rewards
  • Reward: Positive reward for successful task completion, penalties for failures

๐Ÿ“š Learning Highlights

  • Algorithm: PPO (Proximal Policy Optimization)
  • Update Rule: Clipped surrogate objective to ensure stable updates
  • Strengths: Robust, stable, widely used in Unity ML-Agents
  • Limitations: Requires careful tuning of hyperparameters (clip ratio, learning rate, batch size)

๐ŸŽฎ Watch Your Agent Play

You can watch your agent directly in your browser:

  1. Visit Unity ML-Agents on Hugging Face
  2. Find your model ID: KraTUZen/ppo-PyramidsTraining
  3. Select your .nn or .onnx file
  4. Click Watch the agent play ๐Ÿ‘€