DQN Agent playing MountainCar-v0
This is a trained model of a DQN agent playing MountainCar-v0. We train a three-layer MLP as the Q-network. We store the transitions in a replay buffer. After the network converges, we stop training and validate its performance in comparison to a random baseline.
Parameters:
hidden_size = 64
gamma = 0.99
epsilon_decay = 0.999
buffer_size = 10000
batch_size = 64
episodes = 10000
Evaluation results
- mean_reward on MountainCar-v0self-reported-120.10 +/- 19.30