Deep RL Course documentation

Second Quiz

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Second Quiz

The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

Q1: What is Q-Learning?

Q2: What is a Q-table?

Q3: Why if we have an optimal Q-function Q* we have an optimal policy?

Solution

Because if we have an optimal Q-function, we have an optimal policy since we know for each state what is the best action to take.

link value policy

Q4: Can you explain what is Epsilon-Greedy Strategy?

Solution Epsilon Greedy Strategy is a policy that handles the exploration/exploitation trade-off.

The idea is that we define epsilon ɛ = 1.0:

  • With probability 1 — ɛ : we do exploitation (aka our agent selects the action with the highest state-action pair value).
  • With probability ɛ : we do exploration (trying random action).
Epsilon Greedy

Q5: How do we update the Q value of a state, action pair?

Q Update exercise
Solution Q Update exercise

Q6: What’s the difference between on-policy and off-policy

Solution On/off policy

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge.