![autoevaluator's picture](https://cdn-avatars.huggingface.co/v1/production/uploads/1654180084862-6297236d64501abb8dfde40d.png)
autoevaluator
HF staff
Add verifyToken field to verify evaluation results are produced by Hugging Face's automatic model evaluator
a8bbb79
language: en | |
license: cc-by-4.0 | |
tags: | |
- question-answering | |
datasets: | |
- squad_v2 | |
metrics: | |
- f1 | |
- exact | |
widget: | |
- context: DeBERTa improves the BERT and RoBERTa models using disentangled attention | |
and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa | |
on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further | |
improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient | |
Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly | |
improves the model performance on downstream tasks. You can find more technique | |
details about the new model from our paper. Please check the official repository | |
for more implementation details and updates. | |
example_title: DeBERTa v3 Q1 | |
text: How is DeBERTa version 3 different than previous ones? | |
- context: DeBERTa improves the BERT and RoBERTa models using disentangled attention | |
and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa | |
on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further | |
improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient | |
Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly | |
improves the model performance on downstream tasks. You can find more technique | |
details about the new model from our paper. Please check the official repository | |
for more implementation details and updates. | |
example_title: DeBERTa v3 Q2 | |
text: Where do I go to see new info about DeBERTa? | |
model-index: | |
- name: DeBERTa v3 xsmall squad2 | |
results: | |
- task: | |
type: question-answering | |
name: Question Answering | |
dataset: | |
name: SQuAD2.0 | |
type: question-answering | |
metrics: | |
- type: f1 | |
value: 81.5 | |
name: f1 | |
- type: exact | |
value: 78.3 | |
name: exact | |
- task: | |
type: question-answering | |
name: Question Answering | |
dataset: | |
name: squad_v2 | |
type: squad_v2 | |
config: squad_v2 | |
split: validation | |
metrics: | |
- type: exact_match | |
value: 78.5341 | |
name: Exact Match | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTk0ZGQ1YjU1YmQ5NTc2M2RmNjg2OGViYjcyODZkOTc1MDBkNmI5MDc0MzEyMzZmNDg3Yzc4ZTA3ZjAwM2M5ZiIsInZlcnNpb24iOjF9.ewKF-UetUoxKDeXgnM6vqy8nBC9c3qh7dLZhdQlgSxPut3LjAhpCh2fJGir-OVcfzWzxsPhcZQEpdnxR8oZnAA | |
- type: f1 | |
value: 81.6408 | |
name: F1 | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTQwZDdjY2ZlOGVhM2E5NGM3OGNkNTk2NWFkYTg1Y2Q0YWFlYWJmMGIyZWM5ZjMyYTYyODUzMDA0NWU0ZGVkZCIsInZlcnNpb24iOjF9.BHJNhS1YisUIkjcpIMdwXurTewak9dkkpGXC2vHvUB4qUEuk_p3V-orhmeFyTxzLaWRwrZVGVz-NSfqFr4n1Ag | |
- type: total | |
value: 11870 | |
name: total | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzNiZDQ3MDAyNzljMDI4NTRlYzZiZjE4ODJhZDhmZWE2ZjcwNjg2ZWJmNjUyMTUzZDk4ODNjNDExYTk1YWNlOCIsInZlcnNpb24iOjF9.3BlfmMvbV86Ua39ToqnMmgpGS0ZTew0UFFYWGyTkS3u7jaAXCfYkFkNJXw806f2uFFkKr1hqlzzKfivV0wUjCg | |
- task: | |
type: question-answering | |
name: Question Answering | |
dataset: | |
name: squad | |
type: squad | |
config: plain_text | |
split: validation | |
metrics: | |
- type: exact_match | |
value: 84.1741 | |
name: Exact Match | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYTA0MDVlYWI5NzdiNjllM2NmZTYwYmQ5YzE0ODgwOTA3MWZjZDkxNDFmZDM1OTQzMzgwNWI4NDc5NThhM2VhZSIsInZlcnNpb24iOjF9.lc2nUBxSu2_0_a5lyVsV51UAmkE8WHDTwGHvt3n9zvCbcJ1ylOg2xovF0_j0hZS16lv1DEw5XV8EW_ZS7mfvBg | |
- type: f1 | |
value: 91.0771 | |
name: F1 | |
verified: true | |
verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODQxMjkxOWJlZTc2MmE5YzVmMjNhOTkwNDdiMDBhNWUwMDU3MDI1MmJiNDY4MjczYjIwM2U1NDhlYmZlZWQwMSIsInZlcnNpb24iOjF9.x_axHiBX5d3UIi1UbJT3kVbdX4kX9XFLQSg-l16-AAK9tiyutT-yaYJOi8LSb2lR4677tJpf3itu4eriJRU2Cg | |
# DeBERTa v3 xsmall SQuAD 2.0 | |
[Microsoft reports that this model can get 84.8/82.0](https://huggingface.co/microsoft/deberta-v3-xsmall#fine-tuning-on-nlu-tasks) on f1/em on the dev set. | |
I got 81.5/78.3 but I only did one run and I didn't use the official squad2 evaluation script. I will do some more runs and show the results on the official script soon. | |