|
--- |
|
language: en |
|
license: apache-2.0 |
|
datasets: |
|
- amazon_reviews_multi |
|
model-index: |
|
- name: distilbert-base-uncased-finetuned-amazon-reviews |
|
results: |
|
- task: |
|
type: text-classification |
|
name: Text Classification |
|
dataset: |
|
type: amazon-reviews-multi |
|
name: amazon_reviews_multi |
|
split: test |
|
metrics: |
|
- type: accuracy |
|
value: 0.8558 |
|
name: Accuracy top2 |
|
- type: loss |
|
value: 1.2339 |
|
name: Loss |
|
|
|
tags: |
|
- generated_from_keras_callback |
|
|
|
pipeline_tag: text-classification |
|
--- |
|
|
|
|
|
# Model Card for distilbert-base-uncased-finetuned-amazon-reviews |
|
|
|
|
|
# Table of Contents |
|
|
|
- [Model Card for distilbert-base-uncased-finetuned-amazon-reviews](#model-card-for--model_id-) |
|
- [Table of Contents](#table-of-contents) |
|
- [Model Details](#model-details) |
|
- [Uses](#uses) |
|
- [Fine-tuning hyperparameters](#training-details) |
|
- [Evaluation](#evaluation) |
|
- [Framework versions](#framework-versions) |
|
|
|
|
|
# Model Details |
|
|
|
## Model Description |
|
|
|
<!-- Provide a longer summary of what this model is/does. --> |
|
This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on [amazon_reviews_multi](https://huggingface.co/datasets/amazon_reviews_multi) dataset. |
|
This model reaches an accuracy of xxx on the dev set. |
|
|
|
- **Model type:** Language model |
|
- **Language(s) (NLP):** en |
|
- **License:** apache-2.0 |
|
- **Parent Model:** For more details about DistilBERT, check out [this model card](https://huggingface.co/distilbert-base-uncased). |
|
- **Resources for more information:** |
|
- [Model Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/distilbert#transformers.DistilBertForSequenceClassification) |
|
|
|
|
|
# Uses |
|
|
|
You can use this model directly with a pipeline for text classification. |
|
|
|
``` |
|
from transformers import pipeline |
|
|
|
checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews" |
|
classifier = pipeline("text-classification", model=checkpoint) |
|
classifier(["Replace me by any text you'd like."]) |
|
``` |
|
and in TensorFlow: |
|
``` |
|
from transformers import AutoTokenizer, TFAutoModelForSequenceClassification |
|
|
|
checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews" |
|
tokenizer = AutoTokenizer.from_pretrained(checkpoint) |
|
model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint) |
|
|
|
text = "Replace me by any text you'd like." |
|
encoded_input = tokenizer(text, return_tensors='tf') |
|
output = model(encoded_input) |
|
``` |
|
|
|
|
|
# Training Details |
|
|
|
## Training and Evaluation Data |
|
|
|
Here is the raw dataset ([amazon_reviews_multi](https://huggingface.co/datasets/amazon_reviews_multi)) we used for finetuning the model. |
|
The dataset contains 200,000, 5,000, and 5,000 reviews in the training, dev, and test sets respectively. |
|
|
|
## Fine-tuning hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
|
|
+ learning_rate: 2e-05 |
|
+ train_batch_size: 16 |
|
+ eval_batch_size: 16 |
|
+ seed: 42 |
|
+ optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
+ lr_scheduler_type: linear |
|
+ num_epochs: 5 |
|
|
|
## Accuracy |
|
|
|
The fine-tuned model was evaluated on the test set of `amazon_reviews_multi`. |
|
- Accuracy (exact) is the exact match of the number of stars. |
|
- Accuracy (off-by-1) is the percentage of reviews where the number of stars the model predicts differs by a maximum of 1 from the number given by the human reviewer. |
|
|
|
| Split | Accuracy (exact) | Accuracy (off-by-1) | |
|
| -------- | ---------------------- | ------------------- | |
|
| Dev set | 56.96% | 85.50% |
|
| Test set | 57.36% | 85.58% |
|
|
|
|
|
|
|
# Framework versions |
|
|
|
- Transformers 4.26.1 |
|
- TensorFlow 2.11.0 |
|
- Datasets 2.1.0 |
|
- Tokenizers 0.13.2 |