File size: 2,110 Bytes
bf80b15
 
60ce0ed
bf80b15
 
0995d61
 
 
bf80b15
 
 
1b0dbc9
 
bf80b15
 
 
 
 
1b0dbc9
 
bf80b15
1b0dbc9
bf80b15
3a42a42
bf80b15
c6a916f
bf80b15
 
3a42a42
bf80b15
1b0dbc9
bf80b15
1b0dbc9
 
 
 
bf80b15
 
 
3a42a42
bf80b15
 
 
 
 
 
 
218f531
bf80b15
 
 
60ce0ed
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
---
license: apache-2.0
base_model: distilbert-base-uncased
tags:
- generated_from_trainer
metrics:
- accuracy
- f1
model-index:
- name: sentiment-analysis-browser-extension
  results: []
language:
- en
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# Fine-tuned BERT model
We open source this fine-tuned BERT model to identify critical aspects within user reviews of adblocking extensions. For every user review, the model provides a criticality score (in the range of -1 to 1) with the negative scores signifying higher probability of finding critical topics within in the reviews.

We have used the [`distilbert-base-uncased`](https://huggingface.co/distilbert-base-uncased) as the base model and fine-tuned it on a manually annotated dataset of webstore reviews.

Further details can be found in our AsiaCCS paper - [`From User Insights to Actionable Metrics: A User-Focused Evaluation of Privacy-Preserving Browser Extensions`](https://doi.org/10.1145/3634737.3657028).

**Note:** We haven't tested its accuracy on user reviews from other categories but are open to discussing the possibility of extrapolating it to other product categories. Feel free to raise issues in the repo or contact the author directly.

## Intended uses & limitations
The model has been released for free use. It has not been trained on any private user data. Please cite the above paper in our works. 

## Evaluation data

It achieves the following results on the evaluation set:
- Loss: 0.4768
- Accuracy: 0.8615
- F1: 0.8816

## Training procedure

The training dataset comprised of 620 reviews and the test dataset had 150 reviews. The following hyperparameters were used during training:

- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 6

### Framework versions

- Transformers 4.34.0
- Pytorch 2.0.1+cu118
- Datasets 2.14.5
- Tokenizers 0.14.1