Deep RL Course documentation

Mid-way Quiz

Deep RL Course

Unit 0. Welcome to the course

Unit 1. Introduction to Deep Reinforcement Learning

Bonus Unit 1. Introduction to Deep Reinforcement Learning with Huggy

Live 1. How the course work, Q&A, and playing with Huggy

Unit 2. Introduction to Q-Learning

Introduction What is RL? A short recap The two types of value-based methods The Bellman Equation, simplify our value estimation Monte Carlo vs Temporal Difference Learning Mid-way Recap Mid-way Quiz Introducing Q-Learning A Q-Learning example Q-Learning Recap Glossary Hands-on Q-Learning Quiz Conclusion Additional Readings

Unit 3. Deep Q-Learning with Atari Games

Bonus Unit 2. Automatic Hyperparameter Tuning with Optuna

Unit 4. Policy Gradient with PyTorch

Unit 5. Introduction to Unity ML-Agents

Unit 6. Actor Critic methods with Robotics environments

Unit 7. Introduction to Multi-Agents and AI vs AI

Unit 8. Part 1 Proximal Policy Optimization (PPO)

Unit 8. Part 2 Proximal Policy Optimization (PPO) with Doom

Bonus Unit 3. Advanced Topics in Reinforcement Learning

Certification and congratulations

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Mid-way Quiz

The best way to learn and to avoid the illusion of competence is to test yourself. This will help you to find where you need to reinforce your knowledge.

Q1: What are the two main approaches to find optimal policy?

Q2: What is the Bellman Equation?

Solution

The Bellman equation is a recursive equation that works like this: instead of starting for each state from the beginning and calculating the return, we can consider the value of any state as:

Rt+1 + gamma * V(St+1)

The immediate reward + the discounted value of the state that follows

Q3: Define each part of the Bellman Equation

Bellman equation quiz

Solution

Bellman equation solution

Q4: What is the difference between Monte Carlo and Temporal Difference learning methods?

Q5: Define each part of Temporal Difference learning formula

TD Learning exercise

Solution

TD Exercise

Q6: Define each part of Monte Carlo learning formula

MC Learning exercise

Solution

MC Exercise

Congrats on finishing this Quiz 🥳, if you missed some elements, take time to read again the previous sections to reinforce (😏) your knowledge.

←Mid-way Recap Introducing Q-Learning→