BERT MLM Fine-Tuned — Sentiment Classifier (sm3455/bert-mlm-ft)

A domain-adapted BERT model for binary sentiment classification, trained on the Stanford IMDB dataset using a two-stage approach: Masked Language Modeling (MLM) pretraining on unlabeled reviews followed by supervised fine-tuning for sentiment classification.

Model Details

Model Description

This model follows a two-stage domain adaptation pipeline:

Stage 1 — MLM Pretraining: bert-base-uncased was further pretrained on 50,000 unlabeled IMDB movie reviews using Masked Language Modeling (MLM) with a masking probability of 15% (via DataCollatorForLanguageModeling). This adapts BERT's language representations to the movie review domain before any labeled data is introduced.
Stage 2 — Supervised Fine-Tuning: The domain-adapted MLM checkpoint was then fine-tuned for binary sentiment classification (positive / negative) on 25,000 labeled IMDB training examples, evaluated on the 25,000-example test set.

Developed by: Sai Bhargav Manginapudi
Model type: BERT (Bidirectional Encoder Representations from Transformers)
Base model: bert-base-uncased
Language: English
License: Apache 2.0
Fine-tuned from: bert-base-uncased → MLM domain-adapted checkpoint → sequence classifier

Model Sources

Repository: sm3455/bert-mlm-ft on HuggingFace Hub
Demo: Live Gradio Demo — try it directly in your browser

Uses

Direct Use

This model can be used out-of-the-box for binary sentiment classification on English text — particularly movie reviews or similar informal/opinion-style text.

from transformers import pipeline

classifier = pipeline("text-classification", model="sm3455/bert-mlm-ft")

result = classifier("The movie started with a banger, first half was really nice and then slowly loses its magic.")
print(result)
# [{'label': 'LABEL_0 or LABEL_1', 'score': 0.XX}]

Labels: LABEL_0 = Negative, LABEL_1 = Positive

Downstream Use

The domain-adapted MLM checkpoint (Stage 1) can also be extracted and fine-tuned on other NLP tasks in the movie/entertainment domain — NER, aspect-based sentiment, review summarization, etc.

Out-of-Scope Use

Not recommended for formal text, technical documents, or non-English content
Not suitable for multi-class or multi-label sentiment tasks without further fine-tuning
Should not be used for high-stakes decisions without human review

Bias, Risks, and Limitations

Trained exclusively on IMDB movie reviews — performance may degrade on other domains (product reviews, news, social media)
BERT's 512-token limit truncates long reviews; very long inputs may lose tail context
The IMDB dataset reflects the biases of online English-speaking movie reviewers and may not generalize across demographics or cultures

Recommendations

Test on your target domain before deployment. For non-movie-review text, consider further domain adaptation using Stage 1 MLM pretraining on your own unlabeled corpus before fine-tuning.

How to Get Started with the Model

from transformers import pipeline

classifier = pipeline(
    "text-classification",
    model="sm3455/bert-mlm-ft",
    tokenizer="sm3455/bert-mlm-ft"
)

# Example
print(classifier("This film was an absolute masterpiece from start to finish."))
print(classifier("Terrible pacing, weak characters, and a predictable ending."))

Training Details

Training Data

Dataset: stanfordnlp/imdb

Split	Size	Use
`unsupervised`	50,000 unlabeled reviews	Stage 1 — MLM pretraining
`train`	25,000 labeled reviews	Stage 2 — Classification fine-tuning
`test`	25,000 labeled reviews	Evaluation

Labels are balanced: 50% positive, 50% negative in both train and test splits.

Training Procedure

Preprocessing

Tokenizer: bert-base-uncased (WordPiece)
Padding: max_length (512 tokens)
Truncation: enabled
Text column removed before MLM training; label column retained for classification

Stage 1 — MLM Pretraining Hyperparameters

Parameter	Value
Base model	`bert-base-uncased`
Dataset	IMDB unsupervised (50K)
Epochs	3
Batch size	16 per device
Gradient accumulation steps	4 (effective batch = 64)
Learning rate	2e-5
LR scheduler	Linear
Warmup steps	500
Optimizer	AdamW (β1=0.9, β2=0.999, ε=1e-8, weight decay=0.01)
MLM probability	0.20
Precision	FP16 mixed precision

Stage 2 — Classification Fine-Tuning Hyperparameters

Parameter	Value
Base model	MLM checkpoint (Stage 1 output)
Dataset	IMDB train (25K)
Epochs	3
Batch size	16 per device
Gradient accumulation steps	4 (effective batch = 64)
Learning rate	2e-5
LR scheduler	Linear
Optimizer	AdamW (β1=0.9, β2=0.999, ε=1e-8, weight decay=0.01)
Eval strategy	Every 50 steps
Precision	FP16 mixed precision
Experiment tracking	Weights & Biases (W&B)

Evaluation

Testing Data

IMDB test split — 25,000 labeled English movie reviews, balanced across positive and negative classes.

Metrics

Accuracy — fraction of correctly classified reviews
F1 Score — harmonic mean of precision and recall (macro)

Results

Metric	Score
Accuracy	94.4%
F1 Score	0.944

The two-stage domain adaptation approach (MLM pretraining → supervised fine-tuning) outperforms direct fine-tuning of vanilla bert-base-uncased, demonstrating the value of domain adaptation on unlabeled in-domain data before introducing labels.

Technical Specifications

Model Architecture

Architecture: BERT (bert-base-uncased) with a sequence classification head (BertForSequenceClassification)
Parameters: ~110M (BERT-base)
Number of labels: 2 (binary classification)
Objective Stage 1: Masked Language Modeling (MLM)
Objective Stage 2: Cross-entropy classification loss

Compute Infrastructure

Hardware: GPU (FP16 mixed precision training)
Framework: PyTorch + HuggingFace Transformers
Experiment tracking: Weights & Biases (W&B)

Live Demo

A Gradio demo is available directly on the HuggingFace model page. Enter any movie review text and the model will return a sentiment prediction with a confidence score.

import gradio as gr
from transformers import pipeline

classifier = pipeline("text-classification", model="sm3455/bert-mlm-ft")

def classify_text(text):
    result = classifier(text)
    label = result[0]['label']
    score = result[0]['score']
    return f"{label} (confidence: {score:.2f})"

iface = gr.Interface(
    fn=classify_text,
    inputs=gr.Textbox(lines=2, placeholder="Enter your movie review here..."),
    outputs="text",
    title="BERT Sentiment Classifier",
    description="Domain-adapted BERT fine-tuned on IMDB for binary sentiment classification."
)

iface.launch()

Citation

If you use this model, please cite:

@misc{manginapudi2024bertmlmft,
  author = {Sai Bhargav Manginapudi},
  title = {Domain-Adaptive BERT for Sentiment Classification},
  year = {2024},
  publisher = {HuggingFace},
  howpublished = {\url{https://huggingface.co/sm3455/bert-mlm-ft}}
}

Model Card Author

Sai Bhargav Manginapudi M.S. Computer and Information Science — New Jersey Institute of Technology (Dec 2024) LinkedIn | GitHub | saibhargav052000@gmail.com

Downloads last month: 21

Safetensors

Model size

0.1B params

Tensor type

F32

sm3455
/

bert-mlm-ft