Deep RL Course documentation

Second Quiz

Join the Hugging Face community

to get started

# Second Quiz

The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

### Q3: Why if we have an optimal Q-function Q* we have an optimal policy?

Solution

Because if we have an optimal Q-function, we have an optimal policy since we know for each state what is the best action to take.

### Q4: Can you explain what is Epsilon-Greedy Strategy?

Solution Epsilon Greedy Strategy is a policy that handles the exploration/exploitation trade-off.

The idea is that we define epsilon ɛ = 1.0:

• With probability 1 — ɛ : we do exploitation (aka our agent selects the action with the highest state-action pair value).
• With probability ɛ : we do exploration (trying random action).

Solution

### Q6: What’s the difference between on-policy and off-policy

Solution

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge.

< > Update on GitHub