--- language: en tags: - textual-entailment - nli - pytorch datasets: - mnli license: mit widget : - text: "EpCAM is overexpressed in breast cancer. EpCAM is downregulated in breast cancer." --- # BiomedNLP-PubMedBERT finetuned on textual entailment (NLI) The [microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext](https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext?text=%5BMASK%5D+is+a+tumor+suppressor+gene) finetuned on the MNLI dataset. It should be useful in textual entailment tasks involving biomedical corpora. ## Usage Given two sentences (a premise and a hypothesis), the model outputs the logits of entailment, neutral or contradiction. You can test the model using the HuggingFace model widget on the side: - Input two sentences (premise and hypothesis) one after the other. - The model returns the probabilities of 3 labels: entailment(LABEL:0), neutral(LABEL:1) and contradiction(LABEL:2) respectively. To use the model locally on your machine: ```python # import torch # device = torch.device("cuda" if torch.cuda.is_available() else "cpu") from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli") model = AutoModelForSequenceClassification.from_pretrained("lighteternal/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-mnli") premise = 'EpCAM is overexpressed in breast cancer' hypothesis = 'EpCAM is downregulated in breast cancer.' # run through model pre-trained on MNLI x = tokenizer.encode(premise, hypothesis, return_tensors='pt', truncation_strategy='only_first') logits = model(x)[0] probs = logits.softmax(dim=1) print('Probabilities for entailment, neutral, contradiction \n', np.around(probs.cpu(). detach().numpy(),3)) # Probabilities for entailment, neutral, contradiction # 0.001 0.001 0.998 ``` ## Metrics Evaluation on classification accuracy (entailment, contradiction, neutral) on MNLI test set: | Metric | Value | | --- | --- | | Accuracy | 0.8338| See Training Metrics tab for detailed info.