Edit model card

BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext-finetuned-renet

A model for detecting gene disease associations from abstracts. The model classifies as 0 for no association, or 1 for some association.

This model is a fine-tuned version of microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext on the RENET2 dataset. Note that this considers only the abstract data, and not the full text information, from RENET2.

It achieves the following results on the evaluation set:

  • Loss: 0.7226
  • Precision: 0.7799
  • Recall: 0.8211
  • F1: 0.8
  • Accuracy: 0.8641
  • Auc: 0.9325

Training procedure

The abstract dataset from RENET2 was split into 85% train, 15% evaluation being grouped by PMIDs and stratified by labels. That is, no data from the same PMID was seen in multiple both the training and the evaluation set.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 1
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 5

Framework versions

  • Transformers 4.9.0.dev0
  • Pytorch 1.10.0.dev20210630+cu113
  • Datasets 1.8.0
  • Tokenizers 0.10.3
Downloads last month
12