|
--- |
|
license: mit |
|
datasets: |
|
- bigbio/biored |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
--- |
|
|
|
|
|
# Model Card for BioNExt |
|
|
|
BioNExt, is an end-to-end Biomedical Relation Extraction and Classifcation system. The work utilized three modules, a Tagger (Named Entity Recognition), Linker (Entity Linking) and an Extractor (Relation Extraction and Classification). |
|
|
|
This repositories contains two models: |
|
|
|
1. **Tagger:** Named Entity Recognition module, which performs 6 class biomedical NER: **Genes, Diseases, Chemicals, Variants (mutations), Species, and Cell Lines**. |
|
2. **Extractor:** Performs Relation Extraction and classification. The classes for the relation Extraction are: **Positive Correlation, Negative Correlation, Association, Binding, Drug Interaction, Cotreatment, Comparison, and Conversion.** |
|
|
|
For a full description on how to utilize our end-to-end pipeline we point you towards our [GitHub](https://github.com/ieeta-pt/BioNExt) repository. |
|
|
|
|
|
- **Developed by:** IEETA |
|
- **Model type:** BERT Base |
|
- **Language(s) (NLP):** English |
|
- **License:** MIT |
|
- **Finetuned from model:** BioLinkBERT-Large |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [IEETA BioNExt GitHub](https://github.com/ieeta-pt/BioNExt) |
|
- **Paper:** Towards Discovery: An End-to-End System for Uncovering Novel Biomedical Relations [Awaiting Publication] |
|
|
|
**Authors:** |
|
- Tiago Almeida ([ORCID: 0000-0002-4258-3350](https://orcid.org/0000-0002-4258-3350)) |
|
- Richard A A Jonker ([ORCID: 0000-0002-3806-6940](https://orcid.org/0000-0002-3806-6940)) |
|
- Rui Antunes ([ORCID: 0000-0003-3533-8872](https://orcid.org/0000-0003-3533-8872)) |
|
- João R Almeida ([ORCID: 0000-0003-0729-2264](https://orcid.org/0000-0003-0729-2264)) |
|
- Sérgio Matos ([ORCID: 0000-0003-1941-3983](https://orcid.org/0000-0003-1941-3983)) |
|
|
|
|
|
## Uses |
|
|
|
Note we do not take any liability for the use of the model in any professional/medical domain. The model is intended for academic purposes only. |
|
|
|
## How to Get Started with the Model |
|
|
|
Please refer to our GitHub repository for more information on our end-to-end inference pipeline: [IEETA BioNExt GitHub](https://github.com/ieeta-pt/BioNExt) |
|
|
|
|
|
## Training Data |
|
|
|
The training data utilized was the BioRED corpus, wihtin the scope of the BioCreative-VIII challenge. |
|
|
|
Ling Luo, Po-Ting Lai, Chih-Hsuan Wei, Cecilia N Arighi, Zhiyong Lu, BioRED: a rich biomedical relation extraction dataset, Briefings in Bioinformatics, Volume 23, Issue 5, September 2022, bbac282, https://doi.org/10.1093/bib/bbac282 |
|
|
|
|
|
## Results |
|
|
|
As evaluated as an end to end system, our results are as follows: |
|
- **Tagger**: 43.10 |
|
- **Linker**: 32.46 |
|
- **Extractor**: 24.59 |
|
|
|
| Configuration | Entity Pair (P/R/F%) | + Relation (P/R/F%) | + Novel (P/R/F%) | |
|
|---------------------------------------|-----------------------|----------------------|------------------| |
|
| Competition best | -/-/55.84 | -/-/43.03 | -/-/32.75 | |
|
| BioNExt (end-to-end) | 45.89/40.63/43.10 | 34.56/30.60/32.46 | 26.18/23.18/24.59 | |
|
|
|
|
|
## Citation |
|
|
|
**BibTeX:** |
|
|
|
[Awaiting Publication] |