{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tune GPT2 to generate positive reviews\n", "> Optimise GPT2 to produce positive IMDB movie reviews using a BERT sentiment classifier as a reward function." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Figure: Experiment setup to tune GPT2. The yellow arrows are outside the scope of this notebook, but the trained models are available through Hugging Face.
\n", "Figure: Reward mean and distribution evolution during training.
\n", "