In this advanced topic, we address the question: how should we monitor and keep track of powerful reinforcement learning agents that we are training in the real world and interfacing with humans?
As machine learning systems have increasingly impacted modern life, the call for the documentation of these systems has grown.
Such documentation can cover aspects such as the training data used — where it is stored, when it was collected, who was involved, etc. — or the model optimization framework — the architecture, evaluation metrics, relevant papers, etc. — and more.
Today, model cards and datasheets are becoming increasingly available. For example, on the Hub (see documentation here).
If you click on a popular model on the Hub, you can learn about its creation process.
These model and data specific logs are designed to be completed when the model or dataset are created, leaving them to go un-updated when these models are built into evolving systems in the future.
Reinforcement learning systems are fundamentally designed to optimize based on measurements of reward and time. While the notion of a reward function can be mapped nicely to many well-understood fields of supervised learning (via a loss function), understanding of how machine learning systems evolve over time is limited.
To that end, the authors introduce Reward Reports for Reinforcement Learning (the pithy naming is designed to mirror the popular papers Model Cards for Model Reporting and Datasheets for Datasets). The goal is to propose a type of documentation focused on the human factors of reward and time-varying feedback systems.
Reward Reports are living documents for proposed RL deployments that demarcate design choices.
However, many questions remain about the applicability of this framework to different RL applications, roadblocks to system interpretability, and the resonances between deployed supervised machine learning systems and the sequential decision-making utilized in RL.
At a minimum, Reward Reports are an opportunity for RL practitioners to deliberate on these questions and begin the work of deciding how to resolve them in practice.
The core piece specific to documentation designed for RL and feedback-driven ML systems is a change-log. The change-log updates information from the designer (changed training parameters, data, etc.) along with noticed changes from the user (harmful behavior, unexpected responses, etc.).
The change log is accompanied by update triggers that encourage monitoring these effects.
Some of the most impactful RL-driven systems are multi-stakeholder in nature and behind the closed doors of private corporations. These corporations are largely without regulation, so the burden of documentation falls on the public.
If you are interested in contributing, we are building Reward Reports for popular machine learning systems on a public record on GitHub. For further reading, you can visit the Reward Reports paper or look an example report.
This section was written by Nathan Lambert