English
File size: 3,091 Bytes
ff08be7
 
 
 
 
 
 
 
f1ce66f
 
 
 
 
 
 
 
 
21e86e2
 
f1ce66f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
77802e0
f1ce66f
 
 
 
 
 
77802e0
f1ce66f
 
 
 
 
 
8e8a4e7
 
76b981d
8e8a4e7
 
 
f1ce66f
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
---
license: mit
datasets:
- bigbio/biored
language:
- en
metrics:
- f1
---


# Model Card for BioNExt

BioNExt, is an end-to-end Biomedical Relation Extraction and Classifcation system. The work utilized three modules, a Tagger (Named Entity Recognition), Linker (Entity Linking) and an Extractor (Relation Extraction and Classification). 

This repositories contains two models: 

1. **Tagger:** Named Entity Recognition module, which performs 6 class biomedical NER:  **Genes, Diseases, Chemicals, Variants (mutations), Species, and Cell Lines**. 
2. **Extractor:** Performs Relation Extraction and classification. The classes for the relation Extraction are:  **Positive Correlation, Negative Correlation, Association, Binding, Drug Interaction, Cotreatment, Comparison, and Conversion.**

For a full description on how to utilize our end-to-end pipeline we point you towards our [GitHub](https://github.com/ieeta-pt/BioNExt) repository. 


- **Developed by:** IEETA
- **Model type:** BERT Base
- **Language(s) (NLP):** English
- **License:** MIT
- **Finetuned from model:** BioLinkBERT-Large 

### Model Sources

- **Repository:** [IEETA BioNExt GitHub](https://github.com/ieeta-pt/BioNExt)
- **Paper:** Towards Discovery: An End-to-End System for Uncovering Novel Biomedical Relations [Awaiting Publication]

**Authors:**
- Tiago Almeida ([ORCID: 0000-0002-4258-3350](https://orcid.org/0000-0002-4258-3350))
- Richard A A Jonker ([ORCID: 0000-0002-3806-6940](https://orcid.org/0000-0002-3806-6940))
- Rui Antunes ([ORCID: 0000-0003-3533-8872](https://orcid.org/0000-0003-3533-8872))
- João R Almeida ([ORCID: 0000-0003-0729-2264](https://orcid.org/0000-0003-0729-2264))
- Sérgio Matos ([ORCID: 0000-0003-1941-3983](https://orcid.org/0000-0003-1941-3983))


## Uses

Note we do not take any liability for the use of the model in any professional/medical domain. The model is intended for academic purposes only. 

## How to Get Started with the Model

Please refer to our GitHub repository for more information on our end-to-end inference pipeline: [IEETA BioNExt GitHub](https://github.com/ieeta-pt/BioNExt)


## Training Data

The training data utilized was the BioRED corpus, wihtin the scope of the BioCreative-VIII challenge.

Ling Luo, Po-Ting Lai, Chih-Hsuan Wei, Cecilia N Arighi, Zhiyong Lu, BioRED: a rich biomedical relation extraction dataset, Briefings in Bioinformatics, Volume 23, Issue 5, September 2022, bbac282, https://doi.org/10.1093/bib/bbac282


## Results

As evaluated as an end to end system, our results are as follows:
- **Tagger**: 43.10
- **Linker**: 32.46
- **Extractor**: 24.59

| Configuration                         | Entity Pair (P/R/F%) | + Relation (P/R/F%) | + Novel (P/R/F%) |
|---------------------------------------|-----------------------|----------------------|------------------|
| Competition best                     | -/-/55.84         | -/-/43.03                | -/-/32.75            |
| BioNExt (end-to-end)               | 45.89/40.63/43.10 | 34.56/30.60/32.46 | 26.18/23.18/24.59 |


## Citation

**BibTeX:**

[Awaiting Publication]