Deep RL Course documentation

Quiz

Hugging Face's logo
Join the Hugging Face community

and get access to the augmented documentation experience

to get started

Quiz

The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

Q1: What is Reinforcement Learning?

Solution

Reinforcement learning is a framework for solving control tasks (also called decision problems) by building agents that learn from the environment by interacting with it through trial and error and receiving rewards (positive or negative) as unique feedback.

Q2: Define the RL Loop

Exercise RL Loop

At every step:

  • Our Agent receives __ from the environment
  • Based on that __ the Agent takes an __
  • Our Agent will move to the right
  • The Environment goes to a __
  • The Environment gives a __ to the Agent

Q3: What’s the difference between a state and an observation?

Q4: A task is an instance of a Reinforcement Learning problem. What are the two types of tasks?

Q5: What is the exploration/exploitation tradeoff?

Solution

In Reinforcement Learning, we need to balance how much we explore the environment and how much we exploit what we know about the environment.

  • Exploration is exploring the environment by trying random actions in order to find more information about the environment.

  • Exploitation is exploiting known information to maximize the reward.

Exploration Exploitation Tradeoff

Q6: What is a policy?

Solution
  • The Policy π is the brain of our Agent. It’s the function that tells us what action to take given the state we are in. So it defines the agent’s behavior at a given time.
Policy

Q7: What are value-based methods?

Solution
  • Value-based methods is one of the main approaches for solving RL problems.
  • In Value-based methods, instead of training a policy function, we train a value function that maps a state to the expected value of being at that state.

Q8: What are policy-based methods?

Solution
  • In Policy-Based Methods, we learn a policy function directly.
  • This policy function will map from each state to the best corresponding action at that state. Or a probability distribution over the set of possible actions at that state.

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the chapter to reinforce (😏) your knowledge, but do not worry: during the course we’ll go over again of these concepts, and you’ll reinforce your theoretical knowledge with hands-on.