Q-Learning Model for CartPole
This project implements a Q-learning model for the CartPole-v1 environment using Gymnasium. The agent is trained to balance a pole on a moving cart by learning optimal actions through trial and error. The learning process uses an epsilon-greedy strategy, where the agent explores random actions at the beginning and gradually shifts towards exploiting learned actions as training progresses.
Key features of the model:
Discretization: Continuous state variables (cart position, cart velocity, pole angle, and pole angular velocity) are discretized into bins for efficient Q-learning. Q-learning algorithm: The agent updates its Q-values based on the Bellman equation, learning from the rewards it receives after each action. Epsilon-greedy strategy: The agent balances exploration and exploitation
Files:
train.py
: Code for training the agent.cartPole_qtable.npy
: The trained Q-table.replay.mp4
: A video showing the agent's performance.
How to Reproduce:
Install the dependencies:
pip install gymnasium numpy imageio
Run the training script:
python train.py
Use the saved Q-table (
cartpole-qtable.npy
) to evaluate the model.