torchdistill: Distilling fine-tuned BERT-Large -> BERT-Base
Collection
GLUE leaderboard: https://gluebenchmark.com/leaderboard/
Code: https://github.com/yoshitomo-matsubara/torchdistill?tab=readme-ov-file#glue • 9 items • Updated
How to use yoshitomo-matsubara/bert-base-uncased-cola_from_bert-large-uncased-cola with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="yoshitomo-matsubara/bert-base-uncased-cola_from_bert-large-uncased-cola") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("yoshitomo-matsubara/bert-base-uncased-cola_from_bert-large-uncased-cola")
model = AutoModelForSequenceClassification.from_pretrained("yoshitomo-matsubara/bert-base-uncased-cola_from_bert-large-uncased-cola")bert-base-uncased fine-tuned on CoLA dataset, using fine-tuned bert-large-uncased as a teacher model, torchdistill and Google Colab for knowledge distillation.
The training configuration (including hyperparameters) is available here.
I submitted prediction files to the GLUE leaderboard, and the overall GLUE score was 78.9.
Yoshitomo Matsubara: "torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP" at EMNLP 2023 Workshop for Natural Language Processing Open Source Software (NLP-OSS)
[Paper] [OpenReview] [Preprint]
@inproceedings{matsubara2023torchdistill,
title={{torchdistill Meets Hugging Face Libraries for Reproducible, Coding-Free Deep Learning Studies: A Case Study on NLP}},
author={Matsubara, Yoshitomo},
booktitle={Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)},
publisher={Empirical Methods in Natural Language Processing},
pages={153--164},
year={2023}
}