bert-base-uncased fine-tuned on QQP dataset, using fine-tuned bert-large-uncased as a teacher model, torchdistill and Google Colab for knowledge distillation.
The training configuration (including hyperparameters) is available here.
I submitted prediction files to the GLUE leaderboard, and the overall GLUE score was 78.9.

Downloads last month
Hosted inference API
Text Classification
Mask token: [MASK]
This model can be loaded on the Inference API on-demand.