DistilBERT Fine-Tuned on IMDB for Masked Language Modeling

Model Description

This model is a fine-tuned version of distilbert-base-uncased for the masked language modeling task. It has been trained on the IMDb dataset.

Model Training Details

Training Dataset

Dataset: IMDB dataset from Hugging Face
Dataset Split:
- Train: 25,000 samples
- Test: 25,000 samples
- Unsupervised: 50,000 samples
Training and Unsupervised Data Concatenation: Training performed on a combined dataset of train and unsupervised splits.

Training Arguments

The following parameters were used during fine-tuning:

Number of Training Epochs: 10
Overwrite Output Directory: True
Evaluation Strategy: steps
- Evaluation Steps: 500
Checkpoint Save Strategy: steps
- Save Steps: 500
Load Best Model at End: True
Metric for Best Model: eval_loss
- Direction: Lower eval_loss is better (greater_is_better = False).
Learning Rate: 2e-5
Weight Decay: 0.01
Per-Device Batch Size (Training): 32
Per-Device Batch Size (Evaluation): 32
Warmup Steps: 1,000
Mixed Precision Training: Enabled (fp16 = True)
Logging Steps: 100
Gradient Accumulation Steps: 2

Early Stopping

The model was configured with early stopping to prevent overfitting.
Training stopped after 5.87 epochs (21,000 steps), as there was no significant improvement in eval_loss.

Evaluation Results

Metric Used: eval_loss
Final Perplexity: 8.34
Best Checkpoint: Model saved at the end of early stopping (step 21,000).

Model Usage

The model can be used for masked language modeling tasks using the fill-mask pipeline from Hugging Face. Example:

from transformers import pipeline

mask_filler = pipeline("fill-mask", model="Prikshit7766/distilbert-finetuned-imdb-mlm")

text = "This is a great [MASK]."
predictions = mask_filler(text)

for pred in predictions:
    print(f">>> {pred['sequence']}")

Output Example:

>>> This is a great movie.
>>> This is a great film.
>>> This is a great show.
>>> This is a great documentary.
>>> This is a great story.

Prikshit7766
/

distilbert-finetuned-imdb-mlm