metadata
tags:
- ML-Agents-Pyramids
- ppo
- deep-reinforcement-learning
- reinforcement-learning
- ml-agents
model-index:
- name: PPO
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: ML-Agents-Pyramids
type: ML-Agents-Pyramids
metrics:
- type: mean_reward
value: 5.10 +/- 0.85
name: mean_reward
verified: false
PPO Agent playing ML-Agents-Pyramids
This is a trained model of a PPO agent playing ML-Agents-Pyramids using Unity ML-Agents.
Usage
import torch
import numpy as np
# Load the model (you'll need the network architecture)
checkpoint = torch.load("model.pt", map_location='cpu')
# The model can be used with the Pyramids environment
# See the repository for complete usage instructions
Training Results
- Mean reward: 5.10 ± 0.85
- Average pyramids completed: 5.0 per episode
- Training episodes: 3,000
- Target achievement: ✅ SUCCESS (target: 1.75)
Algorithm Details
- Algorithm: Proximal Policy Optimization (PPO)
- Environment: ML-Agents-Pyramids
- Task: Multi-step pyramid completion with curiosity-driven exploration
- Network: Deep neural network with curiosity mechanism
- Training Framework: PyTorch
Task Description
The agent learns to:
- Find and press buttons to spawn pyramids
- Navigate to pyramids and knock them over
- Collect gold bricks from fallen pyramids
- Repeat efficiently to maximize score
This complex task requires:
- Exploration in sparse reward environment
- Multi-step planning and execution
- Spatial navigation and object interaction
Performance Milestones
- Episodes 0-500: Learning basic movement and object interaction
- Episodes 500-1500: Developing pyramid completion strategy
- Episodes 1500-3000: Optimizing efficiency and consistency
Training Environment
- Environment: ML-Agents-Pyramids
- Framework: Custom PyTorch implementation with ML-Agents compatibility
- Training date: 2025-09-05
- Course: Hugging Face Deep RL Course Unit 5
This model was trained as part of the Hugging Face Deep RL Course.