Model Card for Model ID

Sentiment Analysis Model to predict the label from a review given, the labels go from 1 star to 5 stars.

Model Details

Model Description

This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.

Language(s) (NLP): Sentiment Analysis
Finetuned from model [optional]: juliensimon/reviews-sentiment-analysis

Model Sources [optional]

Repository: https://github.com/ElizaClapa/LLM-Project-LHL.git

Training Details

Training Data

The YelpReviewFull dataset consists of reviews from Yelp. It was constructed by Xiang Zhang (xiang.zhang@nyu.edu) from the Yelp Dataset Challenge 2015.

It was first used as a text classification benchmark in the following paper:

Xiang Zhang, Junbo Zhao, Yann LeCun. Character-level Convolutional Networks for Text Classification. Advances in Neural Information Processing Systems 28 (NIPS 2015).

The dataset can be found in this Hugging Face link🔗.

Preprocessing [optional]

Preprocessing steps include removing punctuation, removing stopwords, lemmatizing and padding.

Training Hyperparameters

Evaluation

The performance metrics of the Optimized model were Accuracy, Precission, Recall, and F1-Score.

Evaluation Results:
{'eval_loss': 0.773500382900238, 'eval_accuracy': 0.684, 'eval_f1': 0.6833543859772582, 'eval_runtime': 98.6782, 'eval_samples_per_second': 5.067, 'eval_steps_per_second': 0.638}
Classification Report:
              precision    recall  f1-score   support

      1 star       0.79      0.78      0.79       110
      2 star       0.64      0.69      0.66       112
     3 stars       0.70      0.67      0.69        92
     4 stars       0.62      0.56      0.59       100
     5 stars       0.66      0.71      0.68        86

    accuracy                           0.68       500
   macro avg       0.68      0.68      0.68       500
weighted avg       0.68      0.68      0.68       500

Metrics

The Hyperparameters changed for the optimization of this model include the following:

training_args = TrainingArguments(
    output_dir=repo_name,
    learning_rate=2e-5, 
    per_device_train_batch_size=8,  
    per_device_eval_batch_size=8,  
    num_train_epochs=3,
    weight_decay=0.1,
    eval_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    logging_dir='/content/drive/My Drive/Colab Notebooks/LLM Project GoogleColab/Logs_Full',
    logging_steps=10,
    push_to_hub=True,
    report_to="none"
)

optimized_trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_ds['train'],
    eval_dataset=tokenized_ds['test'],
    #train_dataset=small_train_dataset,
    #eval_dataset=small_eval_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=1, early_stopping_threshold=0.001)]
)

The most important hyperparameter for this optimization was the Learning Rate, which had been modified from 5e-1 to 1e-1, and finally set to 2e-5 for the final optimization. A learning rate that's too high can cause the model to converge too quickly to a suboptimal solution, while too low a learning rate can result in slow convergence, resulting in long training times. A compromised between risk of suboptimal performance and training time was found with the final learning rate used (2e-5).

Another hyperparameter changed was the number of training Epochs, which controls how many times the model sees the entire training dataset. Too few epochs may lead to underfitting, while too many can lead to overfitting. To avoid overfitting, a technique called Early Stopping was used. This technique is used to halt training when the model's performance on the validation set stops improving. This helps prevent overfitting by ensuring that the model does not continue training beyond the point where it is making significant progress.

Another important consideration was the Weight Decay hyperparameter, is it is useful for regularization to avoid overfitting.

Hyperparameters important for memory usage and speed

The following hyperparameters helped to avoid losing valuable model training progress due to the Colab Notebook disconecting from the hosted runtime due to inactivity or reaching the maximum RAM available:

The Per Device Evaluation Batch Size directly affected the speed and memory usage during the evaluation.
The Evaluation Strategy was set to 'epoch' so the model would be evaluated on the validation set everytime one epoch was completed.
The Save Strategy was set to 'epoch' so the models state would be saved with every completed epoch.

Even if the notebook would disconnect, with the saved model's progress, the training could be restarted from that point.

Results

text	label	score
This restaurant was the best ever, I really enjoyed the food there!	5 stars	0.967317
I would recommend this to my family and friends!	4 stars	0.530670
Not that big of a deal, I don't know what everyone is talking about.	3 stars	0.626009
It was okay, not that bad, but also not extremely good	3 stars	0.492008
This was the worst meal I've ever had!	1 star	0.990348