Edit model card

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

DistilBERT with a second step of distillation

Model description

This model replicates the "DistilBERT (D)" model from Table 2 of the DistilBERT paper. In this approach, a DistilBERT student is fine-tuned on SQuAD v1.1, but with a BERT model (also fine-tuned on SQuAD v1.1) acting as a teacher for a second step of task-specific distillation.

In this version, the following pre-trained models were used:

  • Student: distilbert-base-uncased
  • Teacher: lewtun/bert-base-uncased-finetuned-squad-v1

Training data

This model was trained on the SQuAD v1.1 dataset which can be obtained from the datasets library as follows:

from datasets import load_dataset
squad = load_dataset('squad')

Training procedure

Eval results

Exact Match F1
DistilBERT paper 79.1 86.9
Ours 78.4 86.5

The scores were calculated using the squad metric from datasets.

BibTeX entry and citation info

@misc{sanh2020distilbert,
      title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, 
      author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
      year={2020},
      eprint={1910.01108},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}
Downloads last month
0

Dataset used to train Ahmadswaid/distilbert-base-uncased-finetuned-squad-d5716d28