# Mid-way Recap

Before diving into Q-Learning, let’s summarize what we’ve just learned.

We have two types of value-based functions:

- State-value function: outputs the expected return if
**the agent starts at a given state and acts according to the policy forever after.** - Action-value function: outputs the expected return if
**the agent starts in a given state, takes a given action at that state**and then acts accordingly to the policy forever after. - In value-based methods, rather than learning the policy,
**we define the policy by hand**and we learn a value function. If we have an optimal value function, we**will have an optimal policy.**

There are two types of methods to update the value function:

- With
*the Monte Carlo method*, we update the value function from a complete episode, and so we**use the actual discounted return of this episode.** - With
*the TD Learning method,*we update the value function from a step, replacing the unknown$G_t$ with**an estimated return called the TD target.**