|
--- |
|
library_name: transformers |
|
datasets: |
|
- stanfordnlp/imdb |
|
metrics: |
|
- accuracy |
|
tags: |
|
- PyTorch |
|
model-index: |
|
- name: distilbert-imdb |
|
results: |
|
- task: |
|
name: Text Classification |
|
type: text-classification |
|
dataset: |
|
name: imdb |
|
type: imdb |
|
args: plain_text |
|
metrics: |
|
- name: Accuracy |
|
type: accuracy |
|
value: 0.9316 |
|
pipeline_tag: text-classification |
|
license: apache-2.0 |
|
language: |
|
- en |
|
--- |
|
# distilbert-imdb |
|
|
|
This is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on imdb dataset. |
|
|
|
## Performance |
|
- Loss: 0.1958 |
|
- Accuracy: 0.932 |
|
|
|
## How to Get Started with the Model |
|
|
|
Use the code below to get started with the model: |
|
|
|
```python |
|
from transformers import pipeline,DistilBertTokenizer |
|
|
|
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased") |
|
classifier = pipeline("sentiment-analysis", model="3oclock/distilbert-imdb", tokenizer=tokenizer) |
|
result = classifier("I love this movie!") |
|
print(result) |
|
``` |
|
## Model Details |
|
|
|
### Model Description |
|
|
|
This is the model card for a fine-tuned 🤗 transformers model on the IMDb dataset. |
|
|
|
- **Developed by:** Ge Li |
|
- **Model type:** DistilBERT for Sequence Classification |
|
- **Language(s) (NLP):** English |
|
- **License:** [Specify License, e.g., Apache 2.0] |
|
- **Finetuned from model:** `distilbert-base-uncased` |
|
|
|
|
|
## Uses |
|
|
|
### Direct Use |
|
|
|
This model can be used directly for sentiment analysis on movie reviews. It is best suited for classifying English-language text that is similar in nature to movie reviews. |
|
|
|
### Downstream Use [optional] |
|
|
|
This model can be fine-tuned on other sentiment analysis tasks or adapted for tasks like text classification in domains similar to IMDb movie reviews. |
|
|
|
### Out-of-Scope Use |
|
|
|
The model may not perform well on non-English text or text that is significantly different in style and content from the IMDb dataset (e.g., technical documents, social media posts). |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
### Bias |
|
|
|
The IMDb dataset primarily consists of English-language movie reviews and may not generalize well to other languages or types of reviews. |
|
|
|
### Risks |
|
|
|
Misclassification in sentiment analysis can lead to incorrect conclusions in applications relying on this model. |
|
|
|
### Limitations |
|
|
|
The model was trained on a dataset of movie reviews, so it may not perform as well on other types of text data. |
|
|
|
### Recommendations |
|
|
|
Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model. |