ppo-Pyramids / README.md
sam522's picture
Upload Pyramids PPO model for Deep RL Course Unit 5
bfadc6c verified
metadata
tags:
  - ML-Agents-Pyramids
  - ppo
  - deep-reinforcement-learning
  - reinforcement-learning
  - ml-agents
model-index:
  - name: PPO
    results:
      - task:
          type: reinforcement-learning
          name: reinforcement-learning
        dataset:
          name: ML-Agents-Pyramids
          type: ML-Agents-Pyramids
        metrics:
          - type: mean_reward
            value: 5.10 +/- 0.85
            name: mean_reward
            verified: false

PPO Agent playing ML-Agents-Pyramids

This is a trained model of a PPO agent playing ML-Agents-Pyramids using Unity ML-Agents.

Usage

import torch
import numpy as np

# Load the model (you'll need the network architecture)
checkpoint = torch.load("model.pt", map_location='cpu')

# The model can be used with the Pyramids environment
# See the repository for complete usage instructions

Training Results

  • Mean reward: 5.10 ± 0.85
  • Average pyramids completed: 5.0 per episode
  • Training episodes: 3,000
  • Target achievement: ✅ SUCCESS (target: 1.75)

Algorithm Details

  • Algorithm: Proximal Policy Optimization (PPO)
  • Environment: ML-Agents-Pyramids
  • Task: Multi-step pyramid completion with curiosity-driven exploration
  • Network: Deep neural network with curiosity mechanism
  • Training Framework: PyTorch

Task Description

The agent learns to:

  1. Find and press buttons to spawn pyramids
  2. Navigate to pyramids and knock them over
  3. Collect gold bricks from fallen pyramids
  4. Repeat efficiently to maximize score

This complex task requires:

  • Exploration in sparse reward environment
  • Multi-step planning and execution
  • Spatial navigation and object interaction

Performance Milestones

  • Episodes 0-500: Learning basic movement and object interaction
  • Episodes 500-1500: Developing pyramid completion strategy
  • Episodes 1500-3000: Optimizing efficiency and consistency

Training Environment

  • Environment: ML-Agents-Pyramids
  • Framework: Custom PyTorch implementation with ML-Agents compatibility
  • Training date: 2025-09-05
  • Course: Hugging Face Deep RL Course Unit 5

This model was trained as part of the Hugging Face Deep RL Course.