--- license: apache-2.0 language: - en metrics: - accuracy pipeline_tag: text-classification tags: - code --- # Model Card for Bert-base-cased Paraphrase Classification ## Model Details ### Model Description The **bert-base-cased-paraphrase-classification** model is a fine-tuned version of the BERT (Bidirectional Encoder Representations from Transformers) architecture specifically designed for paraphrase classification. It uses the cased variant of BERT as the base model. This model has been fine-tuned for identifying whether two input sentences are paraphrases of each other. - **Developed by:** Rushil Jariwala - **Model type:** Transformer-based neural network - **Language(s) (NLP):** English - **License:** Apache 2.0 - **Finetuned from model:** BERT-base-cased ### Model Sources - **Repository:** [Hugging Face Model Hub](https://huggingface.co/rushilJariwala/bert-base-cased-paraphrase-classification) ## Uses ### Direct Use This model can directly classify whether two sentences are paraphrases of each other. ### Downstream Use When fine-tuned on a specific task or integrated into a larger application, this model can assist in tasks requiring paraphrase identification. ### Out-of-Scope Use This model may not perform optimally on sentences with highly domain-specific vocabulary not seen during training, and it is limited to the English language. ## Bias, Risks, and Limitations This model's performance may vary based on the similarity of sentences to those in the training data. It may exhibit biases based on the dataset used for training. ### Recommendations Users should consider domain-specific fine-tuning for optimal performance in specific applications. Additionally, careful evaluation and validation are recommended for critical applications. ## How to Get Started with the Model Use the following Python code to get started with the model: ```python from transformers import pipeline pipe = pipeline("text-classification", model="rushilJariwala/bert-base-cased-paraphrase-classification") sequences = [ "I've been waiting for a HuggingFace course my whole life.", "This course is amazing!", ] result = pipe(sequences) print(result) #### Preprocessing The text was tokenized using BERT's cased tokenizer with truncation and padding. #### Training Hyperparameters - **Training regime:** [More Information Needed] - Batch Size: 8 - Learning Rate: 5e-5 - Optimizer: AdamW - Number of Epochs: 3 #### Testing Data The model was evaluated on the MRPC validation set. #### Metrics Accuracy: 86.27% #### Summary The model achieved an accuracy of 86.27% on the MRPC validation set.