Vanilla DQN – Breakout

Trained on ALE/Breakout-v5 using Vanilla DQN with vectorised environments.

Environment

Property Value
Environment ALE/Breakout-v5
State space 4 × 84 × 84 stacked grayscale frames
Action space 4 discrete (NOOP, FIRE, RIGHT, LEFT)
Frameskip 4

Algorithm: Vanilla DQN

The target network evaluates both action selection and value estimation:

next_q   = q_target(next_state).max()     # target picks AND evaluates
target_q = reward + γ * (1 - done) * next_q

Hyperparameters

Parameter Value
Global steps 750,000
Buffer size 100,000
Batch size 64
Parallel envs 4
Learning rate 0.0001
Discount (γ) 0.99
Target sync freq 1000
Grad clip norm 10.0
Epsilon start 1.0
Epsilon end 0.01

Usage

import torch
model = DQN()
model.load_state_dict(torch.load('best_breakout.pt')['model'])
model.eval()
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading