Deep RL Course documentation
Q-Learning Recap
Q-Learning Recap
Q-Learning is the RL algorithm that :
Trains a Q-function, an action-value function encoded, in internal memory, by a Q-table containing all the state-action pair values.
Given a state and action, our Q-function will search its Q-table for the corresponding value.
data:image/s3,"s3://crabby-images/8c6ce/8c6ce0be93e7d739d98b3b130447965adfa5a9d5" alt="Q function"
When the training is done, we have an optimal Q-function, or, equivalently, an optimal Q-table.
And if we have an optimal Q-function, we have an optimal policy, since we know, for each state, the best action to take.
data:image/s3,"s3://crabby-images/64397/64397858d188232eb267ae3de0c7d1a465617407" alt="Link value policy"
But, in the beginning, our Q-table is useless since it gives arbitrary values for each state-action pair (most of the time we initialize the Q-table to 0 values). But, as we explore the environment and update our Q-table it will give us a better and better approximation.
data:image/s3,"s3://crabby-images/27a97/27a973e127ffff02461dabe83255d3ddfef459de" alt="q-learning.jpeg"
This is the Q-Learning pseudocode:
data:image/s3,"s3://crabby-images/50ae7/50ae70563093103ea4155b37951856351849731c" alt="Q-Learning"