maclean-connor96
/

feedier-french-books

Text Classification

Inference Endpoints

Model card Files Files and versions Community

Model and approach 🤗

As I am limited by my personal computer, the training was done on the distilbert-base-multilingual-cased model. This model is 60% faster than the classic BERT model and preserves 95% of the original model's accuracy.

The dataset provided contains book titles, authors, reviews, and a score for each book. These columns were concatenated to form large context blocks and were used as the input text. The labels, (0, 1, and -1) were normalized to 0, 1, and 2, and finally to NEUTRAL, POSITIVE, and NEGATIVE to help with legibility of the predictions.

As this exercise is simply to show my capacities to train a model, the model has been trained using 3000 training entries and 300 test entries for 2 epochs.

Notes on the three classes and the model's bias 📝

The distribution of these classes is not equal in the ensemble of this dataset. Although it is shuffled, positive reviews are the most present, and therefore most-often predicted category. In addition, the decision to keep the review score in the text block did have an impact on the biases of the model. The model can make a prediction based on score alone, a number between 1 and 5.

Positive reviews: 2081

Negative reviews: 224

Neutral reviews: 695

Downloads last month: 17

Inference Providers NEW

Text Classification

This model is not currently available via any of the supported Inference Providers.

Dataset used to train maclean-connor96/feedier-french-books