DistilBERT SST-2 Sentiment Analysis Model
This repository contains a fine-tuned DistilBERT model for sentiment analysis, trained on a subset of the SST-2 dataset. The model, tokenizer, and datasets are provided for educational purposes.
Model Details
- Model Name: DistilBERT SST-2 Sentiment Analysis
- Architecture: DistilBERT (distilbert-base-uncased)
- Task: Binary Sentiment Classification
- Dataset: SST-2 (Subset: 600 training samples, 150 test samples)
- Accuracy: 89% on the validation subset
Model Components
- Model: The model is a DistilBERT model fine-tuned for binary sentiment analysis (positive/negative).
- Tokenizer: The tokenizer used is
distilbert-base-uncased
, which is aligned with the DistilBERT model.
Datasets
This repository also includes the datasets used to train and evaluate the model:
- Training Dataset: 600 samples from the SST-2 training set, saved in Parquet format.
- Test Dataset: 150 samples from the SST-2 validation set, saved in Parquet format.
The datasets were tokenized using the DistilBERT tokenizer with the following preprocessing steps:
- Padding: Sentences are padded to the longest sentence in the batch.
- Truncation: Sentences longer than 512 tokens are truncated.
- Max Length: 512 tokens.
Files Included
pytorch_model.bin
: The model weights.config.json
: The model configuration.tokenizer_config.json
: The tokenizer configuration.vocab.txt
: The tokenizer vocabulary file.train_dataset.parquet
: Tokenized training dataset (600 samples) in Parquet format.test_dataset.parquet
: Tokenized test dataset (150 samples) in Parquet format.
Training Details
Training Configuration
The model was fine-tuned using the following hyperparameters:
- Learning Rate: 2e-5
- Batch Size: 16 (training), 64 (evaluation)
- Number of Epochs: 4
- Gradient Accumulation Steps: 3
- Weight Decay: 0.01
- Evaluation Strategy: Evaluated at the end of each epoch
- Logging: Logs were generated every 100 steps
Training Process
The model was trained using the Hugging Face Trainer
API, which provides an easy interface for training and evaluating models. The training process involved regular evaluation steps to monitor accuracy, and the best model based on validation accuracy was loaded at the end of training.
Model Performance
- Validation Accuracy: 89%
The validation accuracy was calculated on the 150 samples from the SST-2 validation set.
Usage Notes
This model is provided for educational purposes. It may not be suitable for production use without further testing and validation on larger datasets.
Acknowledgements
- Hugging Face: For providing the
transformers
library and dataset access. - GLUE Benchmark: For providing the SST-2 dataset.
- SST-2 Dataset: The SST-2 dataset used in this project is part of the GLUE benchmark.
- Downloads last month
- 19