Lunar_Lander_DeepQ_Learning_Model / README.md

Vijay Shrivarshan Vijayaraja

Update README.md

5c137af verified 11 months ago

4.9 kB

metadata

license: mit

Lunar Lander Deep Q-Learning Model

A Deep Q-Network (DQN) implementation to train an agent for the Lunar Lander environment from OpenAI Gym, complete with an interactive visualizer using Pygame.

NOTE: I used only 10 episodes for the purpose of this video. I recommend using at least 500 episodes for better accuracy.

General Information
Features
Tools and Technologies
Setup
Usage
How It Works
Adjusting Hyperparameters
Using the Trained Model
Credits

General Information

This project implements a Deep Q-Network (DQN) to train an agent to solve the Lunar Lander environment from OpenAI Gym. The goal is to teach the agent to safely control a lunar lander to land on the moon's surface by interacting with the environment.

The project includes:

A fully implemented DQN algorithm.
Real-time visualization of the training process using Pygame.
Dynamic plotting of training progress using Matplotlib.

Features

Deep Reinforcement Learning:
- Neural networks approximate the Q-function.
- Implements experience replay and a target network for stability.
Interactive Visualization:
- Displays the Lunar Lander environment in real-time using Pygame.
- Dynamically plots training progress alongside the environment.
Customizable Training:
- Adjustable hyperparameters, including learning rate, batch size, discount factor, and more.
Environment Solving:
- Trains the agent to achieve an average score of 200 over 100 consecutive episodes.

Tools and Technologies

Libraries and Frameworks

Reinforcement Learning:
- PyTorch (for building and training the neural network)
- NumPy (for efficient numerical computations)
Visualization:
- Pygame (for real-time visualization of the environment)
- Matplotlib (for plotting training progress)
Environment:
- OpenAI Gym (Lunar Lander environment)

Hardware Support

CUDA support for GPU acceleration.

Setup

Prerequisites

Python 3.8 or higher
A compatible GPU (optional but recommended for faster training)

Installation

Clone the Repository:

git clone https://github.com/yourusername/lunar-lander-dqn.git
cd lunar-lander-dqn

Install Required Packages:
```
pip install -r requirements.txt
```
Verify Installation: Ensure that PyTorch is installed with CUDA support if you plan to use a GPU.

Usage

Training the Agent

Install Required Packages:
```
python main.py
```
Monitor Training
- View real-time rendering of the Lunar Lander environment.
- Observe the dynamic plot of training scores as the agent learns.
Save Model
- The trained model is automatically saved as checkpoint.pth when the environment is solved (average score ≥ 200 over 100 episodes).

How It Works

Q-Network Architecture
- A feedforward neural network with 2 hidden layers of 64 neurons each
- Input: Current state of the environment
- Output: Q-values for all possible actions
Target Network
- Maintains a separate network to compute target Q-values.
- Samples mini-batches for training to break correlations and stabilize learning.
Experience Replay
- Stores past experiences in a replay buffer.
- Updated periodically to stabilize training.
Epsilon-Greedy Policy
- Balances exploration and exploitation.
- Decays epsilon over time to focus on exploitation as training progresses.

Adjusting Hyperaramenters

You can modify the following hyperparameters in the script to customize training:

Learning Rate: LR (default: 5e-4)

Bactch Size: BATCH_SIZE (default: 64)

Discount Factor (Gamma): GAMMA (default: 0.99)

Replay Buffer Size: BUFFER_SIZE (default: 1e5)

Target Network Update Rate: TAU (default: 1e-3)

Update Frequency: UPDATE_EVERY (default: 4)

Using the Trained Model

Once the model is trained, you can use it to perform inference:

Load the Trained Model: Update your script to load the model:

agent.qnetwork_local.load_state_dict(torch.load('checkpoint.pth'))

Run Inference: Use the agent.act() function to make decisions for the Lunar Lander environment:

state = env.reset()
done = False
while not done:
   action = agent.act(state)
   next_state, reward, done, truncated, _ = env.step(action)
   state = next_state

Credits

Created by Vijay Shrivarshan Vijayaraja

tnvjjr
/

Lunar_Lander_DeepQ_Learning_Model