Deep RL Course documentation

Mid-way Quiz

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Mid-way Quiz

The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

Q1: What are the two main approaches to find optimal policy?

Q2: What is the Bellman Equation?


The Bellman equation is a recursive equation that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:

Rt+1 + gamma * V(St+1)

The immediate reward + the discounted value of the state that follows

Q3: Define each part of the Bellman Equation

Bellman equation quiz
Solution Bellman equation solution

Q4: What is the difference between Monte Carlo and Temporal Difference learning methods?

Q5: Define each part of Temporal Difference learning formula

TD Learning exercise
Solution TD Exercise

Q6: Define each part of Monte Carlo learning formula

MC Learning exercise
Solution MC Exercise

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the previous sections to reinforce (😏) your knowledge.