Timofey/PubMedBERT_Diseases_Side_Effects_Context_Classifier

This model is a fine-tuned model of BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext (hugging-face card). The current model was developed for the web-based ANDDigest system for the classification of the short names of diseases and side effects in texts on the basis of their context (the name considered to be short if it's length is 4 symbols or less). The analyzed name should be replaced in text with tag.

Input:
Any biomedical text where a name of classified object is replaced with tag, for example, this pubmed abstract:
Neurobrucellosis Presenting with Features of Demyelinating Disorder in a Pediatric Patient. Brucellosis is an endemic disease in Saudi Arabia, which can present with variable clinical manifestations. It is a zoonotic disease transmitted from animals to humans. Brucellosis is a multisystemic disease that can present with any system involvement; And neurobrucellosis is a serious complication, sometimes leading to permanent neurological deficit, if treatment is not started promptly. Herein, we present a 6-year boy with neurobrucellosis, who developed demyelination of cerebral white matter and presented with <andsystem-candidate> and seizures.

In this example fever was replaced with <andsystem-candidate>. Please keep in mind that maximum length of input sequence for BERT is limited to 512 tokens.
Output:
LABEL_0 refers to the probability of the FALSE recognition, i.e. if the context of <andsystem-candidate> doesn't corresponds to the context specific for diseases or side effects.
LABEL_1 refers to the probability of the TRUE recognition, i.e. when the context of <andsystem-candidate> corresponds to the context specific for diseases or side effects.

The optimal threshold value for the short names of diseases and side effects for the LABEL_1, was calculated using a gold standard (add link). It is >= 0.9999943971633911.

The Mathew Correlation Coefficient of the model for the long names (>= 15 symbols) is 0.990.
The ROC AUC value of the model, calculated for the short names (<= 4 symbols) is 0.955.

Timofey
/

PubMedBERT_Diseases_Side_Effects_Context_Classifier

You need to agree to share your contact information to access this model