|
--- |
|
license: mit |
|
library_name: stable-baselines3 |
|
tags: |
|
- dqn |
|
- Reinforcement Learning |
|
- Atari |
|
- Pac-Man |
|
--- |
|
|
|
# Agent using DQN to play ALE/Pacman-v5 |
|
|
|
## UPDATE 16 May 2024: Latest DQN model is version 2.8 |
|
|
|
This is an agent that is trained using Stable Baselines3 as part of the capstone project for South Hills School in Spring 2024. |
|
The goal of this project is to gain familiarity with reinforcement learning concepts and tools, and to train an agent to score up into the 400-500 point range in Pacman. |
|
|
|
## Description of Python scripts |
|
To run a script, first ensure that Python is installed. From the root directory of the repository, run python <script_name> <options>. |
|
For a list of available options, run python <script_name> --help. |
|
### watch_agent.py |
|
This will render the specified agent in real-time. |
|
Does not save any evaluation information. |
|
### evaluate_agent.py |
|
This will evaluate a specified agent and append the results to a specified log file. |
|
### get_config.py |
|
This will pull configuration information from the specified agent and save it in JSON format. |
|
The data is pulled from the data file in the agent's zip file and strips out the serialized data |
|
to make the data more human-readable. |
|
The default save file will save to the directory from which the command is run. Best practice is |
|
to save the file to the agent's directory. |
|
### record_video.py |
|
This will record a video of a specified agent being evaluated. |
|
Does not save any evaluation information. |
|
Currently in major development. |
|
Currently located in development branch. |
|
### plot_evaluations.py |
|
This will plot the evaluation data that was gathered during the training run of the specified agent using MatPlotLib. |
|
Charts can be saved to a directory of the user's choosing. |
|
Currently in major development. |
|
Currently located in development branch. |
|
### plot_improvement.py |
|
This plots the score of an agent averaged over all evaluation episodes during a training run. Also plots the |
|
standard deviation. Removes the lowest and highest episode scores from each evaluation. |
|
|