Mobile App Classification

Model description

DistilBERT is a transformer model, smaller and faster than BERT, which was pre-trained on the same corpus in a self-supervised fashion, using the BERT base model as a teacher.

The distilbert-base-uncased model is fine-tuned to classify an mobile app description into one of 6 play store categories. Trained on 9000 samples of English App Descriptions and associated categories of apps available in Google Play.

Fine-tuning

The model was fine-tuned for 5 epochs with a batch size of 16, a learning rate of 2e-05, and a maximum sequence length of 512. Since this was a classification task, the model was trained with a cross-entropy loss function. The best evaluation f1 score achieved by the model was 0.9034534096919489, found after 4 epochs. The accuracy of the model on the test set was 0.9033.

How to use

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

tokenizer = AutoTokenizer.from_pretrained("nsi319/distilbert-base-uncased-finetuned-app")  
model = AutoModelForSequenceClassification.from_pretrained("nsi319/distilbert-base-uncased-finetuned-app")

classifier = pipeline('sentiment-analysis', model=model, tokenizer=tokenizer)

classifier("Disney+ has something for everyone and every mood, all in one place. With endless entertainment from Disney, Pixar, Marvel, Star Wars, National Geographic and Star, there's always something exciting to watch. Watch the latest releases, Original series and movies, classic films, throwbacks and so much more.")

'''Output'''
[{'label': 'Entertainment', 'score': 0.9014402031898499}]

Limitations

Training data consists of apps from 6 play store categories namely Education, Entertainment, Productivity, Sports, News & Magazines and Photography.