Fine-Tuned DistilBERT Model for Emotion Classification

This repository contains a notebook for fine-tuning the DistilBERT model on the emotion classification task. The trained model is available on the Hugging Face Model Hub.

Fine-Tuned Model Link: DistilBERT Base Uncased Fine-Tuned Emotion

Dataset

The emotion classification task is performed on the emotions dataset, which contains textual data labeled with different emotions. The dataset is divided into three subsets:

Train: Used for training the model (16,000 samples)
Validation: Used for evaluating the model during training (2,000 samples)
Test: Used for evaluating the final model performance (2,000 samples)

The dataset is available through the Hugging Face Datasets library.

The notebook is divided into the following sections:

Data Loading and Exploration: Loading the dataset and exploring its structure, including the number of samples, column names, and sample texts.
From Datasets to DataFrames: Converting the dataset to pandas DataFrames for easier data manipulation and visualization.
Class Distribution Analysis: Analyzing the class distribution of the dataset using bar plots.
Text Length Analysis: Analyzing the length of the tweets in terms of the number of words.
Tokenization: Tokenizing the text data using the DistilBERT tokenizer.
Model Initialization and Configuration: Initializing the pre-trained DistilBERT model for sequence classification and configuring the training settings.
Training and Evaluation: Fine-tuning the model on the training dataset and evaluating its performance on the validation dataset.
Evaluation Metrics: Computing evaluation metrics such as accuracy and F1 score on the test dataset.
Conclusion: A summary of the notebook's contents and the process of fine-tuning the DistilBERT model for emotion classification.

Usage

To use the fine-tuned DistilBERT model for emotion classification, follow these steps:

Install the required dependencies listed in the requirements.txt file.
Load the trained model using the Hugging Face Transformers library.
Tokenize the input text using the DistilBERT tokenizer.
Pass the tokenized input through the model to get the predicted emotions.

from transformers import TextClassificationPipeline, AutoModelForSequenceClassification, AutoTokenizer

# Load the fine-tuned model and tokenizer
model_name = "ahmettasdemir/distilbert-base-uncased-finetuned-emotion"
model = AutoModelForSequenceClassification.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Create the text classification pipeline
pipeline = TextClassificationPipeline(model=model, tokenizer=tokenizer)

# Example text to classify
text = "I'm feeling happy today!"

# Perform text classification
result = pipeline(text)

# Print the predicted label
predicted_label = result[0]['label']
print("Predicted Emotion:", predicted_label)

ahmettasdemir
/

distilbert-base-uncased-finetuned-emotion

Fine-Tuned DistilBERT Model for Emotion Classification

Dataset

Contents

Usage

Space using ahmettasdemir/distilbert-base-uncased-finetuned-emotion 1