AIObioEnts: All-in-one biomedical entities

Biomedical named-entity recognition following the all-in-one NER (AIONER) scheme introduced by Luo et al.. This is a straightforward Hugging-Face-compatible implementation without using a decoding head for ease of integration with other pipelines.

For full details, see the main GitHub repository

Anatomical biomedical entities

We have followed the original AIONER training pipeline based on the BioRED dataset along with additional BioRED-compatible datasets for set of core entities (Gene, Disease, Chemical, Species, Variant, Cell line), which we have fine-tuned using a modified version of the latest release of the AnatEM corpus, and a subset of entities that are of interest to us: cell, cell component, tissue, muti-tissue structure, and organ, along with the newly-introduced cancer. This model corresponds to the implementation based on BioLinkBERT-large

F1 scores

The F1 scores on the test set of this modified dataset are shown below:

BioLink-large
Cell 89.28
Cell component 81.23
Tissue 74.49
Cancer 88.35
Organ 81.02
Multi-tissue structure 72.98
Overall 84.39

Usage

The model can be directly used from HuggingFace in a NER pipeline. However, we note that:

  • The model was trained on sentence-level data, and it works best when the input is split
  • Each sentence to tag must be surrounded by the flag corresponding to the entity type one wishes to identify, as in: <entity_type>sentence</entity_type>. In the case of this fine-tuned model, the entity type should be 'ALL'.
  • Since additional 'O' labels are used in the AIONER scheme, the outputs should be postprocessed before aggregating the tags

We provide helper functions to tag individual texts in the main repository

from tagging_fn import process_one_text
from transformers import pipeline
pipe = pipeline('ner', model='SIRIS-Lab/AIObioEnts-AnatEM-biolink-large', aggregation_strategy='none', device=0)
process_one_text(text_to_tag, pipeline=pipe, entity_type='ALL')

References

[1] Ling Luo, Chih-Hsuan Wei, Po-Ting Lai, Robert Leaman, Qingyu Chen, and Zhiyong Lu. "AIONER: All-in-one scheme-based biomedical named entity recognition using deep learning." Bioinformatics, Volume 39, Issue 5, May 2023, btad310.

Downloads last month
13
Safetensors
Model size
332M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for SIRIS-Lab/AIObioEnts-AnatEM-biolink-large

Finetuned
(2)
this model