3oclock
/

distilbert-imdb

Text Classification

Model card Files Files and versions Community

distilbert-imdb / README.md

3oclock's picture

Update README.md

93d7843 verified 8 months ago

|

history blame contribute delete

2.52 kB

	---
	library_name: transformers
	datasets:
	- stanfordnlp/imdb
	metrics:
	- accuracy
	tags:
	- PyTorch
	model-index:
	- name: distilbert-imdb
	results:
	- task:
	name: Text Classification
	type: text-classification
	dataset:
	name: imdb
	type: imdb
	args: plain_text
	metrics:
	- name: Accuracy
	type: accuracy
	value: 0.9316
	pipeline_tag: text-classification
	license: apache-2.0
	language:
	- en
	---
	# distilbert-imdb

	This is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on imdb dataset.

	## Performance
	- Loss: 0.1958
	- Accuracy: 0.932

	## How to Get Started with the Model

	Use the code below to get started with the model:

	```python
	from transformers import pipeline,DistilBertTokenizer

	tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
	classifier = pipeline("sentiment-analysis", model="3oclock/distilbert-imdb", tokenizer=tokenizer)
	result = classifier("I love this movie!")
	print(result)
	```
	## Model Details

	### Model Description

	This is the model card for a fine-tuned 🤗 transformers model on the IMDb dataset.

	- Developed by: Ge Li
	- Model type: DistilBERT for Sequence Classification
	- Language(s) (NLP): English
	- License: [Specify License, e.g., Apache 2.0]
	- Finetuned from model: `distilbert-base-uncased`


	## Uses

	### Direct Use

	This model can be used directly for sentiment analysis on movie reviews. It is best suited for classifying English-language text that is similar in nature to movie reviews.

	### Downstream Use [optional]

	This model can be fine-tuned on other sentiment analysis tasks or adapted for tasks like text classification in domains similar to IMDb movie reviews.

	### Out-of-Scope Use

	The model may not perform well on non-English text or text that is significantly different in style and content from the IMDb dataset (e.g., technical documents, social media posts).

	## Bias, Risks, and Limitations

	### Bias

	The IMDb dataset primarily consists of English-language movie reviews and may not generalize well to other languages or types of reviews.

	### Risks

	Misclassification in sentiment analysis can lead to incorrect conclusions in applications relying on this model.

	### Limitations

	The model was trained on a dataset of movie reviews, so it may not perform as well on other types of text data.

	### Recommendations

	Users (both direct and downstream) should be made aware of the risks, biases, and limitations of the model.