metadata

license: cc-by-4.0
language:
  - en
metrics:
  - f1
  - accuracy
library_name: transformers
pipeline_tag: text-classification

lncrna-biocontext

This model is designed to determine whether a given abstract talks about an lncRNA in the context of disease or not.

The model has been trained on data from lncBook-Wiki about papers which have been curated by experts based on the biological context they discuss. We have collected the abstracts for these papers and simplified the classification into disease/not disease. We then fine-tune a longformer model to make a binary classification.

We achieve pretty good results:

Metric	Score
Accuracy	0.84
F1	0.82
ROC	0.98

Though the test set is only 59 examples, with 22 discussing disease.

The next step will be to be able to classify both the specific disease (e.g. lung adenocarcinoma), and the non-disease context (e.g. localisation) a paper discusses.