Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

This model is a fine-tuned model of BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext (hugging-face card). The current model was developed for the web-based ANDDigest system for the classification of the short names of drugs and metabolites in texts on the basis of their context (the name considered to be short if it's length is 4 symbols or less). The analyzed name should be replaced in text with tag.

Input:
Any biomedical text where a name of classified object is replaced with tag, for example, this pubmed abstract:
Intermittent obstruction of jejunostomy tube due to Ascaris lumbricoides infection. A 45-year-old Costa Rican woman was seen for a jejunostomy tube malfunction. There was no evidence of tube malposition or intestinal obstruction. During endoscopy, a long worm was retrieved from the distal duodenum; it was later confirmed to be Ascaris lumbricoides. After treatment with <andsystem-candidate>, no further episodes of tube occlusion were observed. This case reminds us of the importance of considering helminthic infections and their atypical manifestations in patients from endemic regions.

In this example mebendazole was replaced with <andsystem-candidate>. Please keep in mind that maximum length of input sequence for BERT is limited to 512 tokens.
Output:
LABEL_0 refers to the probability of the FALSE recognition, i.e. if the context of <andsystem-candidate> doesn't corresponds to the context specific for drugs or metabolites.
LABEL_1 refers to the probability of the TRUE recognition, i.e. when the context of <andsystem-candidate> corresponds to the context specific for drugs or metabolites.

The optimal threshold value for the short names of drugs or metabolites for the LABEL_1, was calculated using a gold standard (add link). It is >= 0.999992847442627.

The Mathew Correlation Coefficient of the model for the long names (>= 15 symbols) is 0.983.
The ROC AUC value of the model, calculated for the short names (<= 4 symbols) is 0.907.

Downloads last month
1