# Q-Learning Recap

The *Q-Learning* **is the RL algorithm that** :

Trains

*Q-function*, an**action-value function**that contains, as internal memory, a*Q-table***that contains all the state-action pair values.**Given a state and action, our Q-function

**will search into its Q-table the corresponding value.**

When the training is done,

**we have an optimal Q-function, so an optimal Q-table.**And if we

**have an optimal Q-function**, we have an optimal policy,since we**know for each state, what is the best action to take.**

But, in the beginning, our **Q-table is useless since it gives arbitrary value for each state-action pair (most of the time we initialize the Q-table to 0 values)**. But, as we’ll explore the environment and update our Q-table it will give us better and better approximations

This is the Q-Learning pseudocode: