nbroad's picture
nbroad HF staff
Add evaluation results on the plain_text config of squad (#2)
ae2b9e9
metadata
widget:
  - context: >-
      While deep and large pre-trained models are the state-of-the-art for
      various natural language processing tasks, their huge size poses
      significant challenges for practical uses in resource constrained
      settings. Recent works in knowledge distillation propose task-agnostic as
      well as task-specific methods to compress these models, with task-specific
      ones often yielding higher compression rate. In this work, we develop a
      new task-agnostic distillation framework XtremeDistilTransformers that
      leverages the advantage of task-specific methods for learning a small
      universal model that can be applied to arbitrary tasks and languages. To
      this end, we study the transferability of several source tasks,
      augmentation resources and model architecture for distillation. We
      evaluate our model performance on multiple tasks, including the General
      Language Understanding Evaluation (GLUE) benchmark, SQuAD question
      answering dataset and a massive multi-lingual NER dataset with 41
      languages.
    example_title: xtremedistil q1
    text: What is XtremeDistil?
  - context: >-
      While deep and large pre-trained models are the state-of-the-art for
      various natural language processing tasks, their huge size poses
      significant challenges for practical uses in resource constrained
      settings. Recent works in knowledge distillation propose task-agnostic as
      well as task-specific methods to compress these models, with task-specific
      ones often yielding higher compression rate. In this work, we develop a
      new task-agnostic distillation framework XtremeDistilTransformers that
      leverages the advantage of task-specific methods for learning a small
      universal model that can be applied to arbitrary tasks and languages. To
      this end, we study the transferability of several source tasks,
      augmentation resources and model architecture for distillation. We
      evaluate our model performance on multiple tasks, including the General
      Language Understanding Evaluation (GLUE) benchmark, SQuAD question
      answering dataset and a massive multi-lingual NER dataset with 41
      languages.
    example_title: xtremedistil q2
    text: On what is the model validated?
datasets:
  - squad_v2
metrics:
  - f1
  - exact
tags:
  - question-answering
model-index:
  - name: nbroad/xdistil-l12-h384-squad2
    results:
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad_v2
          type: squad_v2
          config: squad_v2
          split: validation
        metrics:
          - name: Exact Match
            type: exact_match
            value: 75.4591
            verified: true
          - name: F1
            type: f1
            value: 79.3321
            verified: true
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad
          type: squad
          config: plain_text
          split: validation
        metrics:
          - name: Exact Match
            type: exact_match
            value: 81.8604
            verified: true
          - name: F1
            type: f1
            value: 89.6654
            verified: true

xtremedistil-l12-h384 trained on SQuAD 2.0

"eval_exact": 75.45691906005221
"eval_f1": 79.32502968532793