Second Quiz

The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

Q1: What is Q-Learning?

Q2: What is a Q-table?

Q3: Why if we have an optimal Q-function Q* we have an optimal policy?

Solution

Because if we have an optimal Q-function, we have an optimal policy since we know for each state what is the best action to take.

Q4: Can you explain what is Epsilon-Greedy Strategy?

Solution

Epsilon Greedy Strategy is a policy that handles the exploration/exploitation trade-off.

The idea is that we define epsilon ɛ = 1.0:

With probability 1 — ɛ : we do exploitation (aka our agent selects the action with the highest state-action pair value).
With probability ɛ : we do exploration (trying random action).

Q5: How do we update the Q value of a state, action pair?

Solution

Q6: What’s the difference between on-policy and off-policy

Solution

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge.

< > Update on GitHub

Deep RL Course

Second Quiz

Q1: What is Q-Learning?

Q2: What is a Q-table?

Q3: Why if we have an optimal Q-function Q* we have an optimal policy?

Q4: Can you explain what is Epsilon-Greedy Strategy?

Q5: How do we update the Q value of a state, action pair?

Q6: What’s the difference between on-policy and off-policy