Jupyter Notebooks
GitHub link : lihuicham/airbnb-helpfulness-classifier
Fine-tuning Python code in finetuning.ipynb
Team Members (S001 - Synthetic Expert Team E) :
Li Hui Cham, Isaac Sparrow, Christopher Arraya, Nicholas Wong, Lei Zhang, Leonard Yang
Description
This model is an AirBnB reviews helpfulness classifier. It can predict the helpfulness, from most helpful (A) to least helpful (C) of the reviews on AirBnB website.
Pre-trained LLM
Our project fine-tuned FacebookAI/roberta-base for multi-class text (sequence) classification.
Dataset
5000 samples are scraped from AirBnB website based on listing_id
from this Kaggle AirBnB Listings & Reviews dataset.Samples were translated from French to English language.
Training Set : 4560 samples synthetically labelled by GPT-4 Turbo. Cost was approximately $60.
Test/Evaluation Set : 500 samples labelled manually by two groups (each group labelled 250 samples), majority votes applies. A scoring rubrics (shown below) is used for labelling.
Training Details
hyperparameters = {'learning_rate': 3e-05,
'per_device_train_batch_size': 16,
'weight_decay': 1e-04,
'num_train_epochs': 4,
'warmup_steps': 500}
We trained our model on Colab Pro which costed us approximately 56 computing units.
Slides
- Downloads last month
- 14