Vijay Shrivarshan Vijayaraja
Update README.md
5c137af verified
metadata
license: mit

Lunar Lander Deep Q-Learning Model

A Deep Q-Network (DQN) implementation to train an agent for the Lunar Lander environment from OpenAI Gym, complete with an interactive visualizer using Pygame.


NOTE: I used only 10 episodes for the purpose of this video. I recommend using at least 500 episodes for better accuracy.

Table of Contents


General Information

This project implements a Deep Q-Network (DQN) to train an agent to solve the Lunar Lander environment from OpenAI Gym. The goal is to teach the agent to safely control a lunar lander to land on the moon's surface by interacting with the environment.

The project includes:

  • A fully implemented DQN algorithm.

  • Real-time visualization of the training process using Pygame.

  • Dynamic plotting of training progress using Matplotlib.


Features

  • Deep Reinforcement Learning:

    • Neural networks approximate the Q-function.
    • Implements experience replay and a target network for stability.
  • Interactive Visualization:

    • Displays the Lunar Lander environment in real-time using Pygame.
    • Dynamically plots training progress alongside the environment.
  • Customizable Training:

    • Adjustable hyperparameters, including learning rate, batch size, discount factor, and more.
  • Environment Solving:

    • Trains the agent to achieve an average score of 200 over 100 consecutive episodes.

Tools and Technologies

Libraries and Frameworks

  • Reinforcement Learning:

    • PyTorch (for building and training the neural network)
    • NumPy (for efficient numerical computations)
  • Visualization:

    • Pygame (for real-time visualization of the environment)
    • Matplotlib (for plotting training progress)
  • Environment:

    • OpenAI Gym (Lunar Lander environment)

Hardware Support

  • CUDA support for GPU acceleration.

Setup

Prerequisites

  • Python 3.8 or higher
  • A compatible GPU (optional but recommended for faster training)

Installation

  1. Clone the Repository:

    git clone https://github.com/yourusername/lunar-lander-dqn.git
    cd lunar-lander-dqn
    
  2. Install Required Packages:

    pip install -r requirements.txt
    
  3. Verify Installation: Ensure that PyTorch is installed with CUDA support if you plan to use a GPU.


Usage

Training the Agent

  1. Install Required Packages:

    python main.py
    
  2. Monitor Training

    • View real-time rendering of the Lunar Lander environment.
    • Observe the dynamic plot of training scores as the agent learns.
  3. Save Model

    • The trained model is automatically saved as checkpoint.pth when the environment is solved (average score ≥ 200 over 100 episodes).

How It Works

  1. Q-Network Architecture

    • A feedforward neural network with 2 hidden layers of 64 neurons each
    • Input: Current state of the environment
    • Output: Q-values for all possible actions
  2. Target Network

    • Maintains a separate network to compute target Q-values.
    • Samples mini-batches for training to break correlations and stabilize learning.
  3. Experience Replay

    • Stores past experiences in a replay buffer.
    • Updated periodically to stabilize training.
  4. Epsilon-Greedy Policy

    • Balances exploration and exploitation.
    • Decays epsilon over time to focus on exploitation as training progresses.

Adjusting Hyperaramenters

You can modify the following hyperparameters in the script to customize training:

Learning Rate: LR (default: 5e-4)

Bactch Size: BATCH_SIZE (default: 64)

Discount Factor (Gamma): GAMMA (default: 0.99)

Replay Buffer Size: BUFFER_SIZE (default: 1e5)

Target Network Update Rate: TAU (default: 1e-3)

Update Frequency: UPDATE_EVERY (default: 4)


Using the Trained Model

Once the model is trained, you can use it to perform inference:

  1. Load the Trained Model: Update your script to load the model:

    agent.qnetwork_local.load_state_dict(torch.load('checkpoint.pth'))
    
  2. Run Inference: Use the agent.act() function to make decisions for the Lunar Lander environment:

    state = env.reset()
    done = False
    while not done:
       action = agent.act(state)
       next_state, reward, done, truncated, _ = env.step(action)
       state = next_state
    

Credits

Created by Vijay Shrivarshan Vijayaraja