Deep RL Course documentation

Q-Learning Recap

Deep RL Course

Unit 0. Welcome to the course

Unit 1. Introduction to Deep Reinforcement Learning

Bonus Unit 1. Introduction to Deep Reinforcement Learning with Huggy

Live 1. How the course work, Q&A, and playing with Huggy

Unit 2. Introduction to Q-Learning

Introduction What is RL? A short recap The two types of value-based methods The Bellman Equation, simplify our value estimation Monte Carlo vs Temporal Difference Learning Mid-way Recap Mid-way Quiz Introducing Q-Learning A Q-Learning example Q-Learning Recap Glossary Hands-on Q-Learning Quiz Conclusion Additional Readings

Unit 3. Deep Q-Learning with Atari Games

Bonus Unit 2. Automatic Hyperparameter Tuning with Optuna

Unit 4. Policy Gradient with PyTorch

Unit 5. Introduction to Unity ML-Agents

Unit 6. Actor Critic methods with Robotics environments

Unit 7. Introduction to Multi-Agents and AI vs AI

Unit 8. Part 1 Proximal Policy Optimization (PPO)

Unit 8. Part 2 Proximal Policy Optimization (PPO) with Doom

Bonus Unit 3. Advanced Topics in Reinforcement Learning

Certification and congratulations

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Q-Learning Recap

Q-Learning is the RL algorithm that :

Trains a Q-function, an action-value function encoded, in internal memory, by a Q-table containing all the state-action pair values.
Given a state and action, our Q-function will search its Q-table for the corresponding value.

Q function

When the training is done, we have an optimal Q-function, or, equivalently, an optimal Q-table.
And if we have an optimal Q-function, we have an optimal policy, since we know, for each state, the best action to take.

Link value policy

But, in the beginning, our Q-table is useless since it gives arbitrary values for each state-action pair (most of the time we initialize the Q-table to 0 values). But, as we explore the environment and update our Q-table it will give us a better and better approximation.

This is the Q-Learning pseudocode:

Q-Learning

< > Update on GitHub

←A Q-Learning example Glossary→