Vanilla DQN – Breakout
Trained on ALE/Breakout-v5 using Vanilla DQN with vectorised environments.
Environment
| Property | Value |
|---|---|
| Environment | ALE/Breakout-v5 |
| State space | 4 × 84 × 84 stacked grayscale frames |
| Action space | 4 discrete (NOOP, FIRE, RIGHT, LEFT) |
| Frameskip | 4 |
Algorithm: Vanilla DQN
The target network evaluates both action selection and value estimation:
next_q = q_target(next_state).max() # target picks AND evaluates
target_q = reward + γ * (1 - done) * next_q
Hyperparameters
| Parameter | Value |
|---|---|
| Global steps | 750,000 |
| Buffer size | 100,000 |
| Batch size | 64 |
| Parallel envs | 4 |
| Learning rate | 0.0001 |
| Discount (γ) | 0.99 |
| Target sync freq | 1000 |
| Grad clip norm | 10.0 |
| Epsilon start | 1.0 |
| Epsilon end | 0.01 |
Usage
import torch
model = DQN()
model.load_state_dict(torch.load('best_breakout.pt')['model'])
model.eval()