metadata
library_name: ml-agents
tags:
- Pyramids
- deep-reinforcement-learning
- reinforcement-learning
- ML-Agents-Pyramids
- PPO
- Unity
model-index:
- name: PPO-PyramidsTraining6
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: Pyramids
type: Unity-MLAgents-Env
metrics:
- type: mean_reward
value: 1.381
name: mean_reward
verified: false
- type: std_reward
value: 0
name: std_reward
verified: false
๐๏ธ PPO Agent on Pyramids
This repository contains a trained Proximal Policy Optimization (PPO) agent that plays the Pyramids environment using the Unity ML-Agents Library.
๐ Model Card
Model Name: ppo-PyramidsTraining
Environment: Pyramids (Unity ML-Agents)
Algorithm: PPO (Proximal Policy Optimization)
Performance Metric:
- Achieves stable performance in navigating and solving pyramid-based tasks
- Demonstrates convergence to an effective policy
๐ Usage (with ML-Agents)
Documentation: ML-Agents Toolkit Docs
Resume Training
mlagents-learn <your_configuration_file_path.yaml> --run-id=<run_id> --resume
Load and Run
# Example: loading the trained PPO model
# (requires Unity ML-Agents setup)
model_id = "KraTUZen/ppo-PyramidsTraining"
# Select your .nn or .onnx file from the repo
๐ง Notes
- The agent is trained using PPO, a robust on-policy algorithm widely used in Unity ML-Agents.
- The environment involves pyramid navigation and puzzle-solving, requiring precision and strategy.
- The trained model is stored as
.nnor.onnxfiles for direct Unity integration.
๐ Repository Structure
Pyramids.nn/Pyramids.onnxโ Trained PPO policyREADME.mdโ Documentation and usage guide
โ Results
- The agent learns to navigate pyramid structures and solve tasks efficiently.
- Demonstrates stable training and effective policy convergence using PPO.
๐ Environment Overview
- Observation Space: Continuous (agent position, pyramid state, environment features)
- Action Space: Continuous (movement, interaction)
- Objective: Solve pyramid-based tasks and maximize rewards
- Reward: Positive reward for successful task completion, penalties for failures
๐ Learning Highlights
- Algorithm: PPO (Proximal Policy Optimization)
- Update Rule: Clipped surrogate objective to ensure stable updates
- Strengths: Robust, stable, widely used in Unity ML-Agents
- Limitations: Requires careful tuning of hyperparameters (clip ratio, learning rate, batch size)
๐ฎ Watch Your Agent Play
You can watch your agent directly in your browser:
- Visit Unity ML-Agents on Hugging Face
- Find your model ID:
KraTUZen/ppo-PyramidsTraining - Select your
.nnor.onnxfile - Click Watch the agent play ๐