Model Description

TinyBioBERT is a distilled version of the BioBERT which is distilled for 100k training steps using a total batch size of 192 on the PubMed dataset.

Distillation Procedure

This model uses a unique distillation method called ‘transformer-layer distillation’ which is applied on each layer of the student to align the attention maps and the hidden states of the student with those of the teacher.

Architecture and Initialisation

This model uses 4 hidden layers with a hidden dimension size and an embedding size of 768 resulting in a total of 15M parameters. Due to the model's small hidden dimension size, it uses random initialisation.

Citation

If you use this model, please consider citing the following paper:

@article{rohanian2023effectiveness,
  title={On the effectiveness of compact biomedical transformers},
  author={Rohanian, Omid and Nouriborji, Mohammadmahdi and Kouchaki, Samaneh and Clifton, David A},
  journal={Bioinformatics},
  volume={39},
  number={3},
  pages={btad103},
  year={2023},
  publisher={Oxford University Press}
}

Support

If this model helps your work, you can keep the project running with a one-off or monthly contribution:
https://github.com/sponsors/nlpie-research

Downloads last month: 121

Collection including nlpie/tiny-biobert

Compact Biomedical Models

Collection

This collection contains the models from the "On the Effectiveness of Compact Biomedical Transformers" • 9 items • Updated Dec 17, 2025 • 2