osanseviero's picture
osanseviero HF staff
Add evaluation results on adversarial_qa dataset (#2)
f9bc733
metadata
language:
  - en
thumbnail: https://github.com/karanchahal/distiller/blob/master/distiller.jpg
tags:
  - question-answering
license: apache-2.0
datasets:
  - squad
metrics:
  - squad
model-index:
  - name: osanseviero/distilbert-base-uncased-finetuned-squad-d5716d28
    results:
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: adversarial_qa
          type: adversarial_qa
          config: adversarialQA
          split: train
        metrics:
          - name: Loss
            type: loss
            value: 4.052208423614502
            verified: true

DistilBERT with a second step of distillation

Model description

This model replicates the "DistilBERT (D)" model from Table 2 of the DistilBERT paper. In this approach, a DistilBERT student is fine-tuned on SQuAD v1.1, but with a BERT model (also fine-tuned on SQuAD v1.1) acting as a teacher for a second step of task-specific distillation.

In this version, the following pre-trained models were used:

  • Student: distilbert-base-uncased
  • Teacher: lewtun/bert-base-uncased-finetuned-squad-v1

Training data

This model was trained on the SQuAD v1.1 dataset which can be obtained from the datasets library as follows:

from datasets import load_dataset
squad = load_dataset('squad')

Training procedure

Eval results

Exact Match F1
DistilBERT paper 79.1 86.9
Ours 78.4 86.5

The scores were calculated using the squad metric from datasets.

BibTeX entry and citation info

@misc{sanh2020distilbert,
      title={DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter}, 
      author={Victor Sanh and Lysandre Debut and Julien Chaumond and Thomas Wolf},
      year={2020},
      eprint={1910.01108},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}