Deep RL Course documentation

Type of tasks

Deep RL Course

Unit 0. Welcome to the course

Unit 1. Introduction to Deep Reinforcement Learning

Introduction What is Reinforcement Learning? The Reinforcement Learning Framework The type of tasks The Exploration/ Exploitation tradeoff The two main approaches for solving RL problems The “Deep” in Deep Reinforcement Learning Summary Glossary Hands-on Quiz Conclusion Additional Readings

Bonus Unit 1. Introduction to Deep Reinforcement Learning with Huggy

Live 1. How the course work, Q&A, and playing with Huggy

Unit 2. Introduction to Q-Learning

Unit 3. Deep Q-Learning with Atari Games

Bonus Unit 2. Automatic Hyperparameter Tuning with Optuna

Unit 4. Policy Gradient with PyTorch

Unit 5. Introduction to Unity ML-Agents

Unit 6. Actor Critic methods with Robotics environments

Unit 7. Introduction to Multi-Agents and AI vs AI

Unit 8. Part 1 Proximal Policy Optimization (PPO)

Unit 8. Part 2 Proximal Policy Optimization (PPO) with Doom

Bonus Unit 3. Advanced Topics in Reinforcement Learning

Certification and congratulations

Join the Hugging Face community

and get access to the augmented documentation experience

Collaborate on models, datasets and Spaces

Faster examples with accelerated inference

Switch between documentation themes

to get started

Type of tasks

A task is an instance of a Reinforcement Learning problem. We can have two types of tasks: episodic and continuing.

Episodic task

In this case, we have a starting point and an ending point (a terminal state). This creates an episode: a list of States, Actions, Rewards, and new States.

For instance, think about Super Mario Bros: an episode begin at the launch of a new Mario Level and ends when you’re killed or you reached the end of the level.

Mario — Beginning of a new episode.

Continuing tasks

These are tasks that continue forever (no terminal state). In this case, the agent must learn how to choose the best actions and simultaneously interact with the environment.

For instance, an agent that does automated stock trading. For this task, there is no starting point and terminal state. The agent keeps running until we decide to stop it.

Stock Market

To recap:

Tasks recap

←The Reinforcement Learning Framework The Exploration/ Exploitation tradeoff→