BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-renet

A model for detecting gene disease associations from abstracts. The model classifies as 0 for no association, or 1 for some association.

This model is a fine-tuned version of microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext on the RENET2 dataset. Note that this considers only the abstract data, and not the full text information, from RENET2.

It achieves the following results on the evaluation set:

Loss: 0.7226
Precision: 0.7799
Recall: 0.8211
F1: 0.8
Accuracy: 0.8641
Auc: 0.9325

Training procedure

The abstract dataset from RENET2 was split into 85% train, 15% evaluation being grouped by PMIDs and stratified by labels. That is, no data from the same PMID was seen in multiple both the training and the evaluation set.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 1
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 5

Framework versions

Transformers 4.9.0.dev0
Pytorch 1.10.0.dev20210630+cu113
Datasets 1.8.0
Tokenizers 0.10.3