BERT MLM Fine-Tuned — Sentiment Classifier (sm3455/bert-mlm-ft)

A domain-adapted BERT model for binary sentiment classification, trained on the Stanford IMDB dataset using a two-stage approach: Masked Language Modeling (MLM) pretraining on unlabeled reviews followed by supervised fine-tuning for sentiment classification.


Model Details

Model Description

This model follows a two-stage domain adaptation pipeline:

  1. Stage 1 — MLM Pretraining: bert-base-uncased was further pretrained on 50,000 unlabeled IMDB movie reviews using Masked Language Modeling (MLM) with a masking probability of 15% (via DataCollatorForLanguageModeling). This adapts BERT's language representations to the movie review domain before any labeled data is introduced.

  2. Stage 2 — Supervised Fine-Tuning: The domain-adapted MLM checkpoint was then fine-tuned for binary sentiment classification (positive / negative) on 25,000 labeled IMDB training examples, evaluated on the 25,000-example test set.

  • Developed by: Sai Bhargav Manginapudi
  • Model type: BERT (Bidirectional Encoder Representations from Transformers)
  • Base model: bert-base-uncased
  • Language: English
  • License: Apache 2.0
  • Fine-tuned from: bert-base-uncased → MLM domain-adapted checkpoint → sequence classifier

Model Sources


Uses

Direct Use

This model can be used out-of-the-box for binary sentiment classification on English text — particularly movie reviews or similar informal/opinion-style text.

from transformers import pipeline

classifier = pipeline("text-classification", model="sm3455/bert-mlm-ft")

result = classifier("The movie started with a banger, first half was really nice and then slowly loses its magic.")
print(result)
# [{'label': 'LABEL_0 or LABEL_1', 'score': 0.XX}]

Labels: LABEL_0 = Negative, LABEL_1 = Positive

Downstream Use

The domain-adapted MLM checkpoint (Stage 1) can also be extracted and fine-tuned on other NLP tasks in the movie/entertainment domain — NER, aspect-based sentiment, review summarization, etc.

Out-of-Scope Use

  • Not recommended for formal text, technical documents, or non-English content
  • Not suitable for multi-class or multi-label sentiment tasks without further fine-tuning
  • Should not be used for high-stakes decisions without human review

Bias, Risks, and Limitations

  • Trained exclusively on IMDB movie reviews — performance may degrade on other domains (product reviews, news, social media)
  • BERT's 512-token limit truncates long reviews; very long inputs may lose tail context
  • The IMDB dataset reflects the biases of online English-speaking movie reviewers and may not generalize across demographics or cultures

Recommendations

Test on your target domain before deployment. For non-movie-review text, consider further domain adaptation using Stage 1 MLM pretraining on your own unlabeled corpus before fine-tuning.


How to Get Started with the Model

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="sm3455/bert-mlm-ft",
    tokenizer="sm3455/bert-mlm-ft"
)

# Example
print(classifier("This film was an absolute masterpiece from start to finish."))
print(classifier("Terrible pacing, weak characters, and a predictable ending."))

Training Details

Training Data

Dataset: stanfordnlp/imdb

Split Size Use
unsupervised 50,000 unlabeled reviews Stage 1 — MLM pretraining
train 25,000 labeled reviews Stage 2 — Classification fine-tuning
test 25,000 labeled reviews Evaluation

Labels are balanced: 50% positive, 50% negative in both train and test splits.

Training Procedure

Preprocessing

  • Tokenizer: bert-base-uncased (WordPiece)
  • Padding: max_length (512 tokens)
  • Truncation: enabled
  • Text column removed before MLM training; label column retained for classification

Stage 1 — MLM Pretraining Hyperparameters

Parameter Value
Base model bert-base-uncased
Dataset IMDB unsupervised (50K)
Epochs 3
Batch size 16 per device
Gradient accumulation steps 4 (effective batch = 64)
Learning rate 2e-5
LR scheduler Linear
Warmup steps 500
Optimizer AdamW (β1=0.9, β2=0.999, ε=1e-8, weight decay=0.01)
MLM probability 0.20
Precision FP16 mixed precision

Stage 2 — Classification Fine-Tuning Hyperparameters

Parameter Value
Base model MLM checkpoint (Stage 1 output)
Dataset IMDB train (25K)
Epochs 3
Batch size 16 per device
Gradient accumulation steps 4 (effective batch = 64)
Learning rate 2e-5
LR scheduler Linear
Optimizer AdamW (β1=0.9, β2=0.999, ε=1e-8, weight decay=0.01)
Eval strategy Every 50 steps
Precision FP16 mixed precision
Experiment tracking Weights & Biases (W&B)

Evaluation

Testing Data

IMDB test split — 25,000 labeled English movie reviews, balanced across positive and negative classes.

Metrics

  • Accuracy — fraction of correctly classified reviews
  • F1 Score — harmonic mean of precision and recall (macro)

Results

Metric Score
Accuracy 94.4%
F1 Score 0.944

The two-stage domain adaptation approach (MLM pretraining → supervised fine-tuning) outperforms direct fine-tuning of vanilla bert-base-uncased, demonstrating the value of domain adaptation on unlabeled in-domain data before introducing labels.


Technical Specifications

Model Architecture

  • Architecture: BERT (bert-base-uncased) with a sequence classification head (BertForSequenceClassification)
  • Parameters: ~110M (BERT-base)
  • Number of labels: 2 (binary classification)
  • Objective Stage 1: Masked Language Modeling (MLM)
  • Objective Stage 2: Cross-entropy classification loss

Compute Infrastructure

  • Hardware: GPU (FP16 mixed precision training)
  • Framework: PyTorch + HuggingFace Transformers
  • Experiment tracking: Weights & Biases (W&B)

Live Demo

A Gradio demo is available directly on the HuggingFace model page. Enter any movie review text and the model will return a sentiment prediction with a confidence score.

import gradio as gr
from transformers import pipeline

classifier = pipeline("text-classification", model="sm3455/bert-mlm-ft")

def classify_text(text):
    result = classifier(text)
    label = result[0]['label']
    score = result[0]['score']
    return f"{label} (confidence: {score:.2f})"

iface = gr.Interface(
    fn=classify_text,
    inputs=gr.Textbox(lines=2, placeholder="Enter your movie review here..."),
    outputs="text",
    title="BERT Sentiment Classifier",
    description="Domain-adapted BERT fine-tuned on IMDB for binary sentiment classification."
)

iface.launch()

Citation

If you use this model, please cite:

@misc{manginapudi2024bertmlmft,
  author = {Sai Bhargav Manginapudi},
  title = {Domain-Adaptive BERT for Sentiment Classification},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/sm3455/bert-mlm-ft}}
}

Model Card Author

Sai Bhargav Manginapudi M.S. Computer and Information Science — New Jersey Institute of Technology (Dec 2024) LinkedIn | GitHub | saibhargav052000@gmail.com

Downloads last month
21
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train sm3455/bert-mlm-ft

Space using sm3455/bert-mlm-ft 1