README.md · NanditaP/Filter_Non-ExpertUser at main

metadata

license: cc-by-nc-nd-4.0

This model is a multi-class classifier, model fine-tuned using the model 'bert-base-uncased'.

It is built around a large corpus of Twitter users' metadata.

It filters the data into 3 main categories - (1) Non-ExpertUser (2) ExpertUser (3) Other. The aim of this project was to find out whether a tweet belongs to an individual or not. And if it is, whether the person is an expert in the field of Security and Privacy.

Originally, the Model had 4 classes - where the 'Other' field was classified into 'Non-Person' (denoting accounts such as organizations)and 'Unknown'.

Since the main aim was to find out about whether a user is a non-expert user or not, the classes were reduced to 3 classes in this version 2.

The validation scores for the module were as follows

Accuracy = 0.93

Class	Precision	Recall	F1-Score
ExpertUser (0)	0.88	0.90	0.89
Non-ExpertUser (1)	0.95	0.97	0.96
Other (2)	0.85	0.78	0.81

Paper: The paper detailing how it was designed can be found here Perspectives of non-expert users on cyber security and privacy: An analysis of online discussions on twitter

Please cite the paper if you use this model :

Nandita Pattnaik, Shujun Li, and Jason R.C. Nurse. 2023.
Perspectives of non-expert users on cyber security and privacy: An analysis of online discussions on Twitter.
Computers & Security 125 (2023), 103008. https://doi.org/10.1016/j.cose.2022.103008