This GTP2 model was built using PPO on IMDB dataset. Reward model was positive sentiment BERT only for GPT2 generative.