Deep RL Course documentation

Quiz

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Quiz

The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

Q1: What are the advantages of policy-gradient over value-based methods? (Check all that apply)

Q2: What is the Policy Gradient Theorem?

Solution

The Policy Gradient Theorem is a formula that will help us to reformulate the objective function into a differentiable function that does not involve the differentiation of the state distribution.

Policy Gradient

Q3: What’s the difference between policy-based methods and policy-gradient methods? (Check all that apply)

Q4: Why do we use gradient ascent instead of gradient descent to optimize J(θ)?

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read the chapter again to reinforce (😏) your knowledge.

< > Update on GitHub