biot5-small
Model description
T5 is an encoder-decoder model and treats all NLP problems in a text-to-text format.
BioT5 is a transformers model pretrained on a very large corpus of biological data (25 million abstracts) in a self-supervised fashion. This means it was pretrained on the raw texts only, with no humans labelling them in any way (which is why it can use lots of publicly available data) with an automatic process to generate inputs and outputs from those texts.
This model used the T5 v1.1 improvements compared to the original T5 model during the pretraining:
GEGLU activation in feed-forward hidden layer, rather than ReLU - see here Dropout was turned off in pretraining (quality win). Dropout should be re-enabled during fine-tuning Pretrained on self-supervised objective only without mixing in the downstream tasks No parameter sharing between embedding and classifier layer
Acknowledgements
This project would not have been possible without compute generously provided by Google through the Google TPU Research Cloud. Thanks to Yeb Havinga and Gabriele Sarti for helping me get started with the t5x framework.
- Downloads last month
- 0