Update README.md

f5cf129 verified about 2 hours ago

3.17 kB

library_name: ml-agents
tags:
  - Pyramids
  - deep-reinforcement-learning
  - reinforcement-learning
  - ML-Agents-Pyramids
  - PPO
  - Unity
model-index:
  - name: PPO-PyramidsTraining6
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: Pyramids
          type: Unity-MLAgents-Env
        metrics:
          - type: mean_reward
            value: 1.381
            name: mean_reward
            verified: false
          - type: std_reward
            value: 0
            name: std_reward
            verified: false

🏛️ PPO Agent on Pyramids

This repository contains a trained Proximal Policy Optimization (PPO) agent that plays the Pyramids environment using the Unity ML-Agents Library.

📊 Model Card

Model Name: ppo-PyramidsTraining
Environment: Pyramids (Unity ML-Agents)
Algorithm: PPO (Proximal Policy Optimization)
Performance Metric:

Achieves stable performance in navigating and solving pyramid-based tasks
Demonstrates convergence to an effective policy

🚀 Usage (with ML-Agents)

Documentation: ML-Agents Toolkit Docs

Resume Training

mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume

Load and Run

# Example: loading the trained PPO model
# (requires Unity ML-Agents setup)
model_id = "KraTUZen/ppo-PyramidsTraining"
# Select your .nn or .onnx file from the repo

🧠 Notes

The agent is trained using PPO, a robust on-policy algorithm widely used in Unity ML-Agents.
The environment involves pyramid navigation and puzzle-solving, requiring precision and strategy.
The trained model is stored as .nn or .onnx files for direct Unity integration.

📂 Repository Structure

Pyramids.nn / Pyramids.onnx → Trained PPO policy
README.md → Documentation and usage guide

✅ Results

The agent learns to navigate pyramid structures and solve tasks efficiently.
Demonstrates stable training and effective policy convergence using PPO.

🔎 Environment Overview

Observation Space: Continuous (agent position, pyramid state, environment features)
Action Space: Continuous (movement, interaction)
Objective: Solve pyramid-based tasks and maximize rewards
Reward: Positive reward for successful task completion, penalties for failures

📚 Learning Highlights

Algorithm: PPO (Proximal Policy Optimization)
Update Rule: Clipped surrogate objective to ensure stable updates
Strengths: Robust, stable, widely used in Unity ML-Agents
Limitations: Requires careful tuning of hyperparameters (clip ratio, learning rate, batch size)

🎮 Watch Your Agent Play

You can watch your agent directly in your browser:

Visit Unity ML-Agents on Hugging Face
Find your model ID: KraTUZen/ppo-PyramidsTraining
Select your .nn or .onnx file
Click Watch the agent play 👀