Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

biot5-small

Model description

T5 is an encoder-decoder model and treats all NLP problems in a text-to-text format.

BioT5 is a transformers model pretrained on a very large corpus of biological data (25 million abstracts) in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and outputs from those texts.

This model used the T5 v1.1 improvements compared to the original T5 model during the pretraining:

GEGLU activation in feed-forward hidden layer, rather than ReLU - see here Dropout was turned off in pretraining (quality win). Dropout should be re-enabled during fine-tuning Pretrained on self-supervised objective only without mixing in the downstream tasks No parameter sharing between embedding and classifier layer

Acknowledgements

This project would not have been possible without compute generously provided by Google through the Google TPU Research Cloud. Thanks to Yeb Havinga and Gabriele Sarti for helping me get started with the t5x framework.

Downloads last month
0