DistilBERT Fine-Tuned on IMDB for Masked Language Modeling

Model Description

This model is a fine-tuned version of distilbert-base-uncased for the masked language modeling task. It has been trained on the IMDb dataset.

Model Training Details

Training Dataset

  • Dataset: IMDB dataset from Hugging Face
  • Dataset Split:
    • Train: 25,000 samples
    • Test: 25,000 samples
    • Unsupervised: 50,000 samples
  • Training and Unsupervised Data Concatenation: Training performed on a combined dataset of train and unsupervised splits.

Training Arguments

The following parameters were used during fine-tuning:

  • Number of Training Epochs: 10
  • Overwrite Output Directory: True
  • Evaluation Strategy: steps
    • Evaluation Steps: 500
  • Checkpoint Save Strategy: steps
    • Save Steps: 500
  • Load Best Model at End: True
  • Metric for Best Model: eval_loss
    • Direction: Lower eval_loss is better (greater_is_better = False).
  • Learning Rate: 2e-5
  • Weight Decay: 0.01
  • Per-Device Batch Size (Training): 32
  • Per-Device Batch Size (Evaluation): 32
  • Warmup Steps: 1,000
  • Mixed Precision Training: Enabled (fp16 = True)
  • Logging Steps: 100
  • Gradient Accumulation Steps: 2

Early Stopping

  • The model was configured with early stopping to prevent overfitting.
  • Training stopped after 5.87 epochs (21,000 steps), as there was no significant improvement in eval_loss.

Evaluation Results

  • Metric Used: eval_loss
  • Final Perplexity: 8.34
  • Best Checkpoint: Model saved at the end of early stopping (step 21,000).

Model Usage

The model can be used for masked language modeling tasks using the fill-mask pipeline from Hugging Face. Example:

from transformers import pipeline

mask_filler = pipeline("fill-mask", model="Prikshit7766/distilbert-finetuned-imdb-mlm")

text = "This is a great [MASK]."
predictions = mask_filler(text)

for pred in predictions:
    print(f">>> {pred['sequence']}")

Output Example:

>>> This is a great movie.
>>> This is a great film.
>>> This is a great show.
>>> This is a great documentary.
>>> This is a great story.
Downloads last month
11
Safetensors
Model size
67M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for Prikshit7766/distilbert-finetuned-imdb-mlm

Finetuned
(7086)
this model

Dataset used to train Prikshit7766/distilbert-finetuned-imdb-mlm