File size: 3,717 Bytes
16bf87a
6fc51b4
16bf87a
6fc51b4
 
 
 
 
 
 
 
 
8154c76
6fc51b4
cd30809
6fc51b4
 
3f8e8ee
c42f4bc
6fc51b4
c42f4bc
3f8e8ee
6fc51b4
16bf87a
 
6fc51b4
 
16bf87a
 
 
6fc51b4
 
 
 
 
 
 
 
 
8154c76
6fc51b4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8154c76
6fc51b4
 
8154c76
6fc51b4
 
8154c76
 
 
 
 
 
6fc51b4
8154c76
 
 
6fc51b4
8154c76
 
 
6fc51b4
 
 
 
 
8154c76
16bf87a
8154c76
 
16bf87a
8154c76
16bf87a
8154c76
16bf87a
8154c76
 
 
 
 
f5209dc
8154c76
16bf87a
674243d
16bf87a
7e4c9e8
f5209dc
674243d
16bf87a
674243d
 
 
 
16bf87a
 
 
6fc51b4
16bf87a
 
 
 
6fc51b4
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
---
language: en
license: apache-2.0
datasets:
  - amazon_reviews_multi
model-index:
  - name: distilbert-base-uncased-finetuned-amazon-reviews
    results:
      - task:
          type: text-classification
          name: Text Classification
        dataset:
          type: amazon-reviews-multi
          name: amazon_reviews_multi
          split: test
        metrics:
          - type: accuracy
            value: 0.8558
            name: Accuracy top2
          - type: loss
            value: 1.2339
            name: Loss

tags:
- generated_from_keras_callback

pipeline_tag: text-classification
---


# Model Card for distilbert-base-uncased-finetuned-amazon-reviews


#  Table of Contents

- [Model Card for distilbert-base-uncased-finetuned-amazon-reviews](#model-card-for--model_id-)
- [Table of Contents](#table-of-contents)
- [Model Details](#model-details)
- [Uses](#uses)
- [Fine-tuning hyperparameters](#training-details)
- [Evaluation](#evaluation)
- [Framework versions](#framework-versions)


# Model Details

## Model Description

<!-- Provide a longer summary of what this model is/does. -->
This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on [amazon_reviews_multi](https://huggingface.co/datasets/amazon_reviews_multi) dataset.
This model reaches an accuracy of xxx on the dev set.

- **Model type:** Language model
- **Language(s) (NLP):** en
- **License:** apache-2.0
- **Parent Model:** For more details about DistilBERT, check out [this model card](https://huggingface.co/distilbert-base-uncased).
- **Resources for more information:**
    - [Model Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/distilbert#transformers.DistilBertForSequenceClassification)


# Uses

You can use this model directly with a pipeline for text classification.

```
from transformers import pipeline

checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews"
classifier = pipeline("text-classification", model=checkpoint)
classifier(["Replace me by any text you'd like."])
```
and in TensorFlow:
```
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification

checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews"
tokenizer = AutoTokenizer.from_pretrained(checkpoint)
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint)

text = "Replace me by any text you'd like."
encoded_input = tokenizer(text, return_tensors='tf')
output = model(encoded_input)
```


# Training Details

## Training and Evaluation Data

Here is the raw dataset ([amazon_reviews_multi](https://huggingface.co/datasets/amazon_reviews_multi)) we used for finetuning the model.
The dataset contains 200,000, 5,000, and 5,000 reviews in the training, dev, and test sets respectively.

## Fine-tuning hyperparameters

The following hyperparameters were used during training:

+ learning_rate: 2e-05
+ train_batch_size: 16
+ eval_batch_size: 16
+ seed: 42
+ optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+ lr_scheduler_type: linear
+ num_epochs: 5

## Accuracy

The fine-tuned model was evaluated on the test set of `amazon_reviews_multi`.
- Accuracy (exact) is the exact match of the number of stars.
- Accuracy (off-by-1) is the percentage of reviews where the number of stars the model predicts differs by a maximum of 1 from the number given by the human reviewer. 

| Split | Accuracy (exact) | Accuracy (off-by-1) |
| -------- | ---------------------- | ------------------- |
| Dev set  | 56.96%                 | 85.50%
| Test set | 57.36%                 | 85.58%



# Framework versions

- Transformers 4.26.1
- TensorFlow 2.11.0
- Datasets 2.1.0
- Tokenizers 0.13.2