Biomedical term classifier with SetFit in Spanish

Click to expand

Model description
Intended uses and limitations
How to use
Training
Evaluation
Additional information

Model description

This is a SetFit model trained for multilabel biomedical text classification in Spanish.

Intended uses and limitations

The model is prepared to classify medical entities among 21 classes, including diseases, medical procedures, symptoms, and drugs, among others. It still lacks some classes like body structures.

How to use

This model is implemented as part of the KeyCARE library. Install first the keycare module to call the SetFit classifier:

python -m pip install keycare

You can then run the KeyCARE pipeline that uses the SetFit model:

from keycare install TermExtractor.TermExtractor

# initialize the termextractor object
termextractor = TermExtractor()
# Run the pipeline
text = """Acude al Servicio de Urgencias por cefalea frontoparietal derecha.
Mediante biopsia se diagnostica adenocarcinoma de próstata Gleason 4+4=8 con metástasis óseas múltiples.
Se trata con Ácido Zoledrónico 4 mg iv/4 semanas.
"""
termextractor(text)
# You can also access the class storing the SetFit model
categorizer = termextractor.categorizer

Training

The model has been trained using an efficient few-shot learning technique that involves:

Fine-tuning a Sentence Transformer with contrastive learning. The used pre-trained model is SapBERT-from-roberta-base-biomedical-clinical-es from the BSC-NLP4BIA reserch group.
Training a classification head with features from the fine-tuned Sentence Transformer.

The training data has been obtained from NER Gold Standard Corpora also generated by BSC-NLP4BIA, including MedProcNER, DISTEMIST, SympTEMIST, CANTEMIST, and PharmaCoNER, among others.

Evaluation

To be published

Additional information

Author

NLP4BIA at the Barcelona Supercomputing Center

Licensing information

Apache License, Version 2.0

Citation information

To be published

Disclaimer

Click to expand

The models published in this repository are intended for a generalist purpose and are available to third parties. These models may have bias and/or any other undesirable distortions.

When third parties, deploy or provide systems and/or services to other parties using any of these models (or using systems based on these models) or become users of the models, they should note that it is their responsibility to mitigate the risks arising from their use and, in any event, to comply with applicable regulations, including regulations regarding the use of Artificial Intelligence.

BSC-NLP4BIA
/

biomedical-term-classifier-setfit

Biomedical term classifier with SetFit in Spanish

Table of contents

Model description

Intended uses and limitations

How to use

Training

Evaluation

Additional information

Author

Licensing information

Citation information

Disclaimer