Q-Learning – CliffWalking-v1
Trained on CliffWalking-v1 using tabular Q-Learning from scratch.
- Observation space: Discrete(48)
- Action space: Discrete(4)
- Mean Reward: -13.00 ± 0.00
- Episodes: 100,000
- Learning rate: 0.7 | Gamma: 0.95 | Epsilon decay: 0.0005