Deep RL Course documentation

Q-Learning Recap

The Q-Learning is the RL algorithm that :

  • Trains Q-function, an action-value function that contains, as internal memory, a Q-table that contains all the state-action pair values.

  • Given a state and action, our Q-function will search into its Q-table the corresponding value.

  • When the training is done,we have an optimal Q-function, so an optimal Q-table.

  • And if we have an optimal Q-function, we have an optimal policy,since we know for each state, what is the best action to take.

But, in the beginning, our Q-table is useless since it gives arbitrary value for each state-action pair (most of the time we initialize the Q-table to 0 values). But, as we’ll explore the environment and update our Q-table it will give us better and better approximations


This is the Q-Learning pseudocode: