bert-base-uncased-finetuned-mrpc-v2
BERT ("bert-base-uncased"
) finetuned on MRPC (Microsoft Research Paraphrase Corpus).
The model predicts whether two sentences are semantically equivalent. It pertains to section 4 of chapter 3 of the Hugging Face "NLP Course" (https://huggingface.co/learn/nlp-course/chapter3/4).
It was trained using a custom PyTorch loop with Hugging Face Accelerate.
Code: https://github.com/sambitmukherjee/huggingface-notebooks/blob/main/course/en/chapter3/section4.ipynb
Experiment tracking: https://wandb.ai/sadhaklal/bert-base-uncased-finetuned-mrpc-v2
Usage
from transformers import pipeline
classifier = pipeline("text-classification", model="sadhaklal/bert-base-uncased-finetuned-mrpc-v2")
sentence1 = "A tropical storm rapidly developed in the Gulf of Mexico Sunday and was expected to hit somewhere along the Texas or Louisiana coasts by Monday night ."
sentence2 = "A tropical storm rapidly developed in the Gulf of Mexico on Sunday and could have hurricane-force winds when it hits land somewhere along the Louisiana coast Monday night ."
sentence_pair = sentence1 + " [SEP] " + sentence2
print(classifier(sentence_pair))
sentence1 = "The settling companies would also assign their possible claims against the underwriters to the investor plaintiffs , he added ."
sentence2 = "Under the agreement , the settling companies will also assign their potential claims against the underwriters to the investors , he added ."
sentence_pair = sentence1 + " [SEP] " + sentence2
print(classifier(sentence_pair))
Dataset
From the dataset page:
The Microsoft Research Paraphrase Corpus (Dolan & Brockett, 2005) is a corpus of sentence pairs automatically extracted from online news sources, with human annotations for whether the sentences in the pair are semantically equivalent.
Examples: https://huggingface.co/datasets/glue/viewer/mrpc
Metrics
Accuracy on the 'validation' split of MRPC: 0.875
F1 on the 'validation' split of MRPC: 0.9128
- Downloads last month
- 50