yoshitomo-matsubara/bert-base-uncased-stsb_from_bert-large-uncased-stsb

bert-base-uncased fine-tuned on STS-B dataset, using fine-tuned bert-large-uncased as a teacher model, torchdistill and Google Colab for knowledge distillation.
The training configuration (including hyperparameters) is available here.
I submitted prediction files to the GLUE leaderboard, and the overall GLUE score was 78.9.

Yoshitomo Matsubara: "torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP" at EMNLP 2023 Workshop for Natural Language Processing Open Source Software (NLP-OSS)

[Paper] [OpenReview] [Preprint]

@inproceedings{matsubara2023torchdistill,
  title={{torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP}},
  author={Matsubara, Yoshitomo},
  booktitle={Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)},
  publisher={Empirical Methods in Natural Language Processing},
  pages={153--164},
  year={2023}
}

yoshitomo-matsubara
/

bert-base-uncased-stsb_from_bert-large-uncased-stsb

Collection including yoshitomo-matsubara/bert-base-uncased-stsb_from_bert-large-uncased-stsb

torchdistill: Distilling fine-tuned BERT-Large -> BERT-Base