The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.
Q1: What are the two main approaches to find optimal policy?
Q2: What is the Bellman Equation?
The Bellman equation is a recursive equation that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:
Rt+1 + gamma * V(St+1)
The immediate reward + the discounted value of the state that follows
Q3: Define each part of the Bellman Equation
Q4: What is the difference between Monte Carlo and Temporal Difference learning methods?
Q5: Define each part of Temporal Difference learning formula
Q6: Define each part of Monte Carlo learning formula
Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the previous sections to reinforce (😏) your knowledge.