The Q-Learning is the RL algorithm that :
Trains Q-function, an action-value function that contains, as internal memory, a Q-table that contains all the state-action pair values.
Given a state and action, our Q-function will search into its Q-table the corresponding value.
When the training is done,we have an optimal Q-function, so an optimal Q-table.
And if we have an optimal Q-function, we have an optimal policy,since we know for each state, what is the best action to take.
But, in the beginning, our Q-table is useless since it gives arbitrary value for each state-action pair (most of the time we initialize the Q-table to 0 values). But, as we’ll explore the environment and update our Q-table it will give us better and better approximations
This is the Q-Learning pseudocode: