--- language: en license: apache-2.0 base_model: bert-base-uncased tags: - generated_from_trainer - paraphrase-identification - bert - glue - mrpc metrics: - accuracy - f1 datasets: - glue model-index: - name: bert-base-uncased-finetuned-mrpc results: - task: type: text-classification name: Paraphrase Identification dataset: name: GLUE MRPC type: glue args: mrpc metrics: - name: Accuracy type: accuracy value: 0.8652 - name: F1 type: f1 value: 0.9057 --- # BERT Fine-tuned on MRPC This model is a fine-tuned version of [bert-base-uncased](https://huggingface.co/bert-base-uncased) on the MRPC (Microsoft Research Paraphrase Corpus) dataset from the GLUE benchmark. It is designed to determine whether two given sentences are semantically equivalent. ## Model description The model uses the BERT base architecture (12 layers, 768 hidden dimensions, 12 attention heads) and has been fine-tuned specifically for the paraphrase identification task. The output layer predicts whether the input sentence pair expresses the same meaning. Key specifications: - Base model: bert-base-uncased - Task type: Binary classification (paraphrase/not paraphrase) - Training method: Fine-tuning all layers - Language: English ## Intended uses & limitations ### Intended uses - Paraphrase detection - Semantic similarity assessment - Question duplicate detection - Content matching - Automated text comparison ### Limitations - Only works with English text - Performance may degrade on out-of-domain text - May struggle with complex or nuanced semantic relationships - Limited to comparing pairs of sentences (not longer texts) ## Training and evaluation data The model was trained on the Microsoft Research Paraphrase Corpus (MRPC) from the GLUE benchmark: - Training set: 3,667 sentence pairs - Validation set: 408 sentence pairs - Each pair is labeled as either paraphrase (1) or non-paraphrase (0) - Class distribution: approximately 67.4% positive (paraphrase) and 32.6% negative (non-paraphrase) ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - Learning rate: 3e-05 - Batch size: 8 (train and eval) - Optimizer: AdamW (betas=(0.9,0.999), epsilon=1e-08) - LR scheduler: Linear decay - Number of epochs: 3 - Max sequence length: 512 - Weight decay: 0.01 ### Training results | Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 | |:-------------:|:-----:|:----:|:---------------:|:--------:|:------:| | No log | 1.0 | 459 | 0.3905 | 0.8382 | 0.8878 | | 0.5385 | 2.0 | 918 | 0.4275 | 0.8505 | 0.8961 | | 0.3054 | 3.0 | 1377 | 0.5471 | 0.8652 | 0.9057 | ### Framework versions - Transformers 4.46.2 - PyTorch 2.5.1+cu121 - Datasets 3.1.0 - Tokenizers 0.20.3 ## Performance analysis The model achieves strong performance on the MRPC validation set: - Accuracy: 86.52% - F1 Score: 90.57% These metrics indicate that the model is effective at identifying paraphrases while maintaining a good balance between precision and recall. ## Example usage ```python from transformers import AutoTokenizer, AutoModelForSequenceClassification # Load model and tokenizer tokenizer = AutoTokenizer.from_pretrained("real-jiakai/bert-base-uncased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("real-jiakai/bert-base-uncased-finetuned-mrpc") # Example function def check_paraphrase(sentence1, sentence2): inputs = tokenizer(sentence1, sentence2, return_tensors="pt", padding=True, truncation=True) outputs = model(**inputs) prediction = outputs.logits.argmax().item() return "Paraphrase" if prediction == 1 else "Not paraphrase" # Example usage sentence1 = "The cat sat on the mat." sentence2 = "A cat was sitting on the mat." result = check_paraphrase(sentence1, sentence2) print(f"Result: {result}") ```