nbroad's picture
nbroad HF staff
Add evaluation results on the plain_text config of squad (#2)
ae2b9e9
---
widget:
- context: While deep and large pre-trained models are the state-of-the-art for various
natural language processing tasks, their huge size poses significant challenges
for practical uses in resource constrained settings. Recent works in knowledge
distillation propose task-agnostic as well as task-specific methods to compress
these models, with task-specific ones often yielding higher compression rate.
In this work, we develop a new task-agnostic distillation framework XtremeDistilTransformers
that leverages the advantage of task-specific methods for learning a small universal
model that can be applied to arbitrary tasks and languages. To this end, we study
the transferability of several source tasks, augmentation resources and model
architecture for distillation. We evaluate our model performance on multiple tasks,
including the General Language Understanding Evaluation (GLUE) benchmark, SQuAD
question answering dataset and a massive multi-lingual NER dataset with 41 languages.
example_title: xtremedistil q1
text: What is XtremeDistil?
- context: While deep and large pre-trained models are the state-of-the-art for various
natural language processing tasks, their huge size poses significant challenges
for practical uses in resource constrained settings. Recent works in knowledge
distillation propose task-agnostic as well as task-specific methods to compress
these models, with task-specific ones often yielding higher compression rate.
In this work, we develop a new task-agnostic distillation framework XtremeDistilTransformers
that leverages the advantage of task-specific methods for learning a small universal
model that can be applied to arbitrary tasks and languages. To this end, we study
the transferability of several source tasks, augmentation resources and model
architecture for distillation. We evaluate our model performance on multiple tasks,
including the General Language Understanding Evaluation (GLUE) benchmark, SQuAD
question answering dataset and a massive multi-lingual NER dataset with 41 languages.
example_title: xtremedistil q2
text: On what is the model validated?
datasets:
- squad_v2
metrics:
- f1
- exact
tags:
- question-answering
model-index:
- name: nbroad/xdistil-l12-h384-squad2
results:
- task:
type: question-answering
name: Question Answering
dataset:
name: squad_v2
type: squad_v2
config: squad_v2
split: validation
metrics:
- name: Exact Match
type: exact_match
value: 75.4591
verified: true
- name: F1
type: f1
value: 79.3321
verified: true
- task:
type: question-answering
name: Question Answering
dataset:
name: squad
type: squad
config: plain_text
split: validation
metrics:
- name: Exact Match
type: exact_match
value: 81.8604
verified: true
- name: F1
type: f1
value: 89.6654
verified: true
---
xtremedistil-l12-h384 trained on SQuAD 2.0
"eval_exact": 75.45691906005221
"eval_f1": 79.32502968532793