--- language: en license: apache-2.0 datasets: - amazon_reviews_multi model-index: - name: distilbert-base-uncased-finetuned-amazon-reviews results: - task: type: text-classification name: Text Classification dataset: type: amazon-reviews-multi name: amazon_reviews_multi split: test metrics: - type: accuracy value: 0.8558 name: Accuracy top2 - type: loss value: 1.2339 name: Loss tags: - generated_from_keras_callback pipeline_tag: text-classification --- # Model Card for distilbert-base-uncased-finetuned-amazon-reviews # Table of Contents - [Model Card for distilbert-base-uncased-finetuned-amazon-reviews](#model-card-for--model_id-) - [Table of Contents](#table-of-contents) - [Model Details](#model-details) - [Uses](#uses) - [Fine-tuning hyperparameters](#training-details) - [Evaluation](#evaluation) - [Framework versions](#framework-versions) # Model Details ## Model Description This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on [amazon_reviews_multi](https://huggingface.co/datasets/amazon_reviews_multi) dataset. This model reaches an accuracy of xxx on the dev set. - **Model type:** Language model - **Language(s) (NLP):** en - **License:** apache-2.0 - **Parent Model:** For more details about DistilBERT, check out [this model card](https://huggingface.co/distilbert-base-uncased). - **Resources for more information:** - [Model Documentation](https://huggingface.co/docs/transformers/main/en/model_doc/distilbert#transformers.DistilBertForSequenceClassification) # Uses You can use this model directly with a pipeline for text classification. ``` from transformers import pipeline checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews" classifier = pipeline("text-classification", model=checkpoint) classifier(["Replace me by any text you'd like."]) ``` and in TensorFlow: ``` from transformers import AutoTokenizer, TFAutoModelForSequenceClassification checkpoint = "amir7d0/distilbert-base-uncased-finetuned-amazon-reviews" tokenizer = AutoTokenizer.from_pretrained(checkpoint) model = TFAutoModelForSequenceClassification.from_pretrained(checkpoint) text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors='tf') output = model(encoded_input) ``` # Training Details ## Training and Evaluation Data Here is the raw dataset ([amazon_reviews_multi](https://huggingface.co/datasets/amazon_reviews_multi)) we used for finetuning the model. The dataset contains 200,000, 5,000, and 5,000 reviews in the training, dev, and test sets respectively. ## Fine-tuning hyperparameters The following hyperparameters were used during training: + learning_rate: 2e-05 + train_batch_size: 16 + eval_batch_size: 16 + seed: 42 + optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 + lr_scheduler_type: linear + num_epochs: 5 ## Accuracy The fine-tuned model was evaluated on the test set of `amazon_reviews_multi`. - Accuracy (exact) is the exact match of the number of stars. - Accuracy (off-by-1) is the percentage of reviews where the number of stars the model predicts differs by a maximum of 1 from the number given by the human reviewer. | Split | Accuracy (exact) | Accuracy (off-by-1) | | -------- | ---------------------- | ------------------- | | Dev set | 56.96% | 85.50% | Test set | 57.36% | 85.58% # Framework versions - Transformers 4.26.1 - TensorFlow 2.11.0 - Datasets 2.1.0 - Tokenizers 0.13.2