Review Training Bot

This model was trained for the purpose of generating scores and reviews for any given movie. It is fine-tuned on distilgpt2 as a baseline and trained on a custom dataset created by scraping around 120k letterboxd reviews. The current state of the model can get the correct formatting reliably but oftentimes is prone to gibberish. Further training will hopefully add coherency. It is in version 0.1 currently.

Intended uses & limitations

This model is intended to be used for entertainment.

Limitations for this model will be much of the same as distilgpt2 which can be viewed here https://huggingface.co/distilgpt2. These may include persistent biases. Another issue may be through language specifically on letterboxd that the algorithm may not be able to understand. i.e. an LGBT+ film on letterboxd may have multiple reviews that mention the word "gay" positively, this model has not been able to understand this contextual usage and will use the word as a slur. As the current model also struggles to find a connection between movie titles and the reviews, this could happen with any entered movie.

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 10
eval_batch_size: 20
seed: 42
distributed_type: multi-GPU
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 100
training_steps: 5000

Framework versions

Transformers 4.21.2
Pytorch 1.12.1+cu113
Tokenizers 0.12.1