Edit model card

Bert-finetuned-mrpc Fine-tuned for Sequence classification

This model is a fine-tuned version of bert-finetuned-mrpc for sequence classification tasks.

Model Description

Dataset

  • Name: MRPC (Microsoft Research Paraphrase Corpus)

  • Description: The MRPC dataset consists of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.

  • Source: The dataset is part of the GLUE benchmark.

Model description

This model is a fine-tuned version of BERT-base-uncased, specifically trained to determine if two sentences are paraphrases of each other. The model outputs 1 if the sentences are equivalent and 0 if they are not.

  • Model architecture: BertForSequenceClassification
  • Task: sequence-classification
  • Training dataset: glue mrpc dataset
  • Number of parameters: 109,483,778
  • Sequence length: 512
  • Vocab size: 30522
  • Hidden size: 768
  • Number of attention heads: 12
  • Number of hidden layers: 12

Intended Uses & Limitations

Intended Uses

  • Paraphrase Detection: This model can be used to determine if two sentences are paraphrases of each other, which is useful in applications like duplicate question detection in forums, semantic search, and text summarization.

  • Educational Purposes: Can be used for educational purposes to demonstrate fine-tuning of transformer models on specific tasks.

Limitations

  • Dataset Bias: The MRPC dataset contains sentence pairs from specific news sources, which might introduce bias. The model might not perform well on text from other domains.

  • Context Limitations: The model evaluates sentences pairwise without considering broader context, which might lead to incorrect paraphrase detections in complex contexts.

Training procedure

  • Optimizer: AdamW

  • Learning Rate: 5e-5

  • Epochs: 3

  • Batch Size: 8

Evaluation results

{'accuracy': 0.8504901960784313, 'f1': 0.8942807625649913}

Downloads last month
10
Safetensors
Model size
109M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.