File size: 1,034 Bytes
8a13977
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
---
license: cc-by-4.0
language:
- en
metrics:
- f1
- accuracy
library_name: transformers
pipeline_tag: text-classification
---

# lncrna-biocontext
This model is designed to determine whether a given abstract talks about an lncRNA in the context of disease or not.

The model has been trained on data from [lncBook-Wiki](https://ngdc.cncb.ac.cn/lncbook/) about papers
which have been curated by experts based on the biological context they discuss. We have collected the
abstracts for these papers and simplified the classification into disease/not disease. We then fine-tune a 
[longformer](https://huggingface.co/allenai/longformer-base-4096) model to make a binary classification.

We achieve pretty good results:

| Metric | Score |
|-|- |
| Accuracy | 0.84 |
| F1 | 0.82 |
| ROC| 0.98 |

Though the test set is only 59 examples, with 22 discussing disease. 

The next step will be to be able to classify both the specific disease (e.g. lung adenocarcinoma), and the non-disease 
context (e.g. localisation) a paper discusses.