Edit model card

Fine-tuning a GPT-2 model on the IMDB dataset using Proximal Policy Optimization (PPO). The goal is to train the model to generate positive sentiment reviews. The training process utilizes the trl library for reinforcement learning, the transformers library for model handling, and datasets for dataset management. Implementation code is available here: GitHub

# Load model and tokenizer directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("pt-sk/GPT2-IMDB-Sentiment-FineTuning-with-PPO")
model = AutoModelForCausalLM.from_pretrained("pt-sk/GPT2-IMDB-Sentiment-FineTuning-with-PPO")

# Example usage
input_text = "The movie was fantastic"
inputs = tokenizer(input_text, return_tensors='pt')
outputs = model.generate(**inputs)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month
4
Safetensors
Model size
124M params
Tensor type
F32
·

Dataset used to train pt-sk/GPT2-IMDB-Sentiment-FineTuning-with-PPO