|
--- |
|
license: cc-by-4.0 |
|
language: |
|
- en |
|
metrics: |
|
- f1 |
|
- accuracy |
|
library_name: transformers |
|
pipeline_tag: text-classification |
|
--- |
|
|
|
# lncrna-biocontext |
|
This model is designed to determine whether a given abstract talks about an lncRNA in the context of disease or not. |
|
|
|
The model has been trained on data from [lncBook-Wiki](https://ngdc.cncb.ac.cn/lncbook/) about papers |
|
which have been curated by experts based on the biological context they discuss. We have collected the |
|
abstracts for these papers and simplified the classification into disease/not disease. We then fine-tune a |
|
[longformer](https://huggingface.co/allenai/longformer-base-4096) model to make a binary classification. |
|
|
|
We achieve pretty good results: |
|
|
|
| Metric | Score | |
|
|-|- | |
|
| Accuracy | 0.84 | |
|
| F1 | 0.82 | |
|
| ROC| 0.98 | |
|
|
|
Though the test set is only 59 examples, with 22 discussing disease. |
|
|
|
The next step will be to be able to classify both the specific disease (e.g. lung adenocarcinoma), and the non-disease |
|
context (e.g. localisation) a paper discusses. |
|
|
|
|
|
|