--- license: cc-by-4.0 language: - en metrics: - f1 - accuracy library_name: transformers pipeline_tag: text-classification --- # lncrna-biocontext This model is designed to determine whether a given abstract talks about an lncRNA in the context of disease or not. The model has been trained on data from [lncBook-Wiki](https://ngdc.cncb.ac.cn/lncbook/) about papers which have been curated by experts based on the biological context they discuss. We have collected the abstracts for these papers and simplified the classification into disease/not disease. We then fine-tune a [longformer](https://huggingface.co/allenai/longformer-base-4096) model to make a binary classification. We achieve pretty good results: | Metric | Score | |-|- | | Accuracy | 0.84 | | F1 | 0.82 | | ROC| 0.98 | Though the test set is only 59 examples, with 22 discussing disease. The next step will be to be able to classify both the specific disease (e.g. lung adenocarcinoma), and the non-disease context (e.g. localisation) a paper discusses.