File size: 4,327 Bytes

4b1d92d
89a25d0
4b1d92d
89a25d0
 
 
 
 
 
 
4b1d92d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
89a25d0
4b1d92d
 
 
 
89a25d0
4b1d92d
89a25d0
 
4b1d92d
89a25d0
4b1d92d
 
 
 
 
 
 
 
 
89a25d0
4b1d92d
89a25d0
4b1d92d
89a25d0
 
4b1d92d
89a25d0
4b1d92d
89a25d0
 
4b1d92d
89a25d0
4b1d92d
89a25d0
5978ee2
 
 
 
 
 
 
 
 
89a25d0
5978ee2
89a25d0
5978ee2
89a25d0
 
5978ee2
89a25d0
5978ee2
89a25d0
4b1d92d

---
language: en
license: cc-by-4.0
tags:
- question-answering
datasets:
- squad_v2
metrics:
- f1
- exact
widget:
- context: DeBERTa improves the BERT and RoBERTa models using disentangled attention
    and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa
    on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further
    improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient
    Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly
    improves the model performance on downstream tasks. You can find more technique
    details about the new model from our paper. Please check the official repository
    for more implementation details and updates.
  example_title: DeBERTa v3 Q1
  text: How is DeBERTa version 3 different than previous ones?
- context: DeBERTa improves the BERT and RoBERTa models using disentangled attention
    and enhanced mask decoder. With those two improvements, DeBERTa out perform RoBERTa
    on a majority of NLU tasks with 80GB training data. In DeBERTa V3, we further
    improved the efficiency of DeBERTa using ELECTRA-Style pre-training with Gradient
    Disentangled Embedding Sharing. Compared to DeBERTa, our V3 version significantly
    improves the model performance on downstream tasks. You can find more technique
    details about the new model from our paper. Please check the official repository
    for more implementation details and updates.
  example_title: DeBERTa v3 Q2
  text: Where do I go to see new info about DeBERTa?
model-index:
- name: DeBERTa v3 xsmall squad2
  results:
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: SQuAD2.0
      type: question-answering
    metrics:
    - type: f1
      value: 81.5
      name: f1
    - type: exact
      value: 78.3
      name: exact
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squad_v2
      type: squad_v2
      config: squad_v2
      split: validation
    metrics:
    - type: exact_match
      value: 78.5341
      name: Exact Match
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTk0ZGQ1YjU1YmQ5NTc2M2RmNjg2OGViYjcyODZkOTc1MDBkNmI5MDc0MzEyMzZmNDg3Yzc4ZTA3ZjAwM2M5ZiIsInZlcnNpb24iOjF9.ewKF-UetUoxKDeXgnM6vqy8nBC9c3qh7dLZhdQlgSxPut3LjAhpCh2fJGir-OVcfzWzxsPhcZQEpdnxR8oZnAA
    - type: f1
      value: 81.6408
      name: F1
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiOTQwZDdjY2ZlOGVhM2E5NGM3OGNkNTk2NWFkYTg1Y2Q0YWFlYWJmMGIyZWM5ZjMyYTYyODUzMDA0NWU0ZGVkZCIsInZlcnNpb24iOjF9.BHJNhS1YisUIkjcpIMdwXurTewak9dkkpGXC2vHvUB4qUEuk_p3V-orhmeFyTxzLaWRwrZVGVz-NSfqFr4n1Ag
    - type: total
      value: 11870
      name: total
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiNzNiZDQ3MDAyNzljMDI4NTRlYzZiZjE4ODJhZDhmZWE2ZjcwNjg2ZWJmNjUyMTUzZDk4ODNjNDExYTk1YWNlOCIsInZlcnNpb24iOjF9.3BlfmMvbV86Ua39ToqnMmgpGS0ZTew0UFFYWGyTkS3u7jaAXCfYkFkNJXw806f2uFFkKr1hqlzzKfivV0wUjCg
  - task:
      type: question-answering
      name: Question Answering
    dataset:
      name: squad
      type: squad
      config: plain_text
      split: validation
    metrics:
    - type: exact_match
      value: 84.1741
      name: Exact Match
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiYTA0MDVlYWI5NzdiNjllM2NmZTYwYmQ5YzE0ODgwOTA3MWZjZDkxNDFmZDM1OTQzMzgwNWI4NDc5NThhM2VhZSIsInZlcnNpb24iOjF9.lc2nUBxSu2_0_a5lyVsV51UAmkE8WHDTwGHvt3n9zvCbcJ1ylOg2xovF0_j0hZS16lv1DEw5XV8EW_ZS7mfvBg
    - type: f1
      value: 91.0771
      name: F1
      verified: true
      verifyToken: eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiODQxMjkxOWJlZTc2MmE5YzVmMjNhOTkwNDdiMDBhNWUwMDU3MDI1MmJiNDY4MjczYjIwM2U1NDhlYmZlZWQwMSIsInZlcnNpb24iOjF9.x_axHiBX5d3UIi1UbJT3kVbdX4kX9XFLQSg-l16-AAK9tiyutT-yaYJOi8LSb2lR4677tJpf3itu4eriJRU2Cg
---



# DeBERTa v3 xsmall SQuAD 2.0

[Microsoft reports that this model can get 84.8/82.0](https://huggingface.co/microsoft/deberta-v3-xsmall#fine-tuning-on-nlu-tasks) on f1/em on the dev set. 

I got 81.5/78.3 but I only did one run and I didn't use the official squad2 evaluation script. I will do some more runs and show the results on the official script soon.