Edit model card

Overview

Language model: Shobhank-iiitdwd/DistBERT-squad2-QA
Language: English
Training data: SQuAD 2.0 training set x 20 augmented + SQuAD 2.0 training set without augmentation
Eval data: SQuAD 2.0 dev set
Infrastructure: 1x V100 GPU
Published: Dec 8th, 2021

Details

  • haystack's intermediate layer and prediction layer distillation features were used for training. bert-base-uncased-squad2 was used as the teacher model and DBERT_General_6L_768D was used as the student model.

Hyperparameters

Intermediate layer distillation

batch_size = 26
n_epochs = 5
max_seq_len = 384
learning_rate = 5e-5
lr_schedule = LinearWarmup
embeds_dropout_prob = 0.1
temperature = 1

Prediction layer distillation

batch_size = 26
n_epochs = 5
max_seq_len = 384
learning_rate = 3e-5
lr_schedule = LinearWarmup
embeds_dropout_prob = 0.1
temperature = 1
distillation_loss_weight = 1.0

Performance

"exact": 71.87736882001179
"f1": 76.36111895973675
Downloads last month
7
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train Shobhank-iiitdwd/DistBERT-squad2-QA

Evaluation results