bioformer-8L fined-tuned on the SQuAD1 dataset for 3 epochs.

The fine-tuning process was performed on a single P100 GPUs (16GB). The hyperparameters are:

max_seq_length=512
per_device_train_batch_size=16
gradient_accumulation_steps=1
total train batch size (w. parallel, distributed & accumulation) = 16
learning_rate=3e-5
num_train_epochs=3

Evaluation results

"eval_exact_match": 78.55250709555345
"eval_f1": 85.91482799690257

Bioformer's performance is on par with DistilBERT (EM/F1: 77.7/85.8), although Bioformer was pretrained only on biomedical texts.

Speed

In our experiments, the inference speed of Bioformer is 3x as fast as BERT-base/BioBERT/PubMedBERT, and is 40% faster than DistilBERT.

Downloads last month
23
Safetensors
Model size
42.3M params
Tensor type
I64
F32
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.