Q-Learning Model for CartPole

This project implements a Q-learning model for the CartPole-v1 environment using Gymnasium. The agent is trained to balance a pole on a moving cart by learning optimal actions through trial and error. The learning process uses an epsilon-greedy strategy, where the agent explores random actions at the beginning and gradually shifts towards exploiting learned actions as training progresses.

Key features of the model:

Discretization: Continuous state variables (cart position, cart velocity, pole angle, and pole angular velocity) are discretized into bins for efficient Q-learning. Q-learning algorithm: The agent updates its Q-values based on the Bellman equation, learning from the rewards it receives after each action. Epsilon-greedy strategy: The agent balances exploration and exploitation

Files:

train.py: Code for training the agent.
cartPole_qtable.npy: The trained Q-table.
replay.mp4: A video showing the agent's performance.

How to Reproduce:

Install the dependencies:
```
pip install gymnasium numpy imageio
```
Run the training script:
```
python train.py
```
Use the saved Q-table (cartpole-qtable.npy) to evaluate the model.