lewtun's picture
lewtun HF staff
Add evaluation results on the squad_v2 config of squad_v2
a0f1cf1
|
raw
history blame
2.68 kB
metadata
license: cc-by-4.0
widget:
  - context: >-
      DeBERTa improves the BERT and RoBERTa models using disentangled attention
      and enhanced mask decoder. With those two improvements, DeBERTa out
      perform RoBERTa on a majority of NLU tasks with 80GB training data. In
      DeBERTa V3, we further improved the efficiency of DeBERTa using
      ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing.
      Compared to DeBERTa, our V3 version significantly improves the model
      performance on downstream tasks. You can find more technique details about
      the new model from our paper. Please check the official repository for
      more implementation details and updates.
    example_title: DeBERTa v3 Q1
    text: How is DeBERTa version 3 different than previous ones?
  - context: >-
      DeBERTa improves the BERT and RoBERTa models using disentangled attention
      and enhanced mask decoder. With those two improvements, DeBERTa out
      perform RoBERTa on a majority of NLU tasks with 80GB training data. In
      DeBERTa V3, we further improved the efficiency of DeBERTa using
      ELECTRA-Style pre-training with Gradient Disentangled Embedding Sharing.
      Compared to DeBERTa, our V3 version significantly improves the model
      performance on downstream tasks. You can find more technique details about
      the new model from our paper. Please check the official repository for
      more implementation details and updates.
    example_title: DeBERTa v3 Q2
    text: Where do I go to see new info about DeBERTa?
datasets:
  - squad_v2
metrics:
  - f1
  - exact
tags:
  - question-answering
language: en
model-index:
  - name: DeBERTa v3 xsmall squad2
    results:
      - task:
          name: Question Answering
          type: question-answering
        dataset:
          name: SQuAD2.0
          type: question-answering
        metrics:
          - name: f1
            type: f1
            value: 81.5
          - name: exact
            type: exact
            value: 78.3
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad_v2
          type: squad_v2
          config: squad_v2
          split: validation
        metrics:
          - name: Exact Match
            type: exact_match
            value: 78.5341
            verified: true
          - name: F1
            type: f1
            value: 81.6408
            verified: true
          - name: total
            type: total
            value: 11870
            verified: true

DeBERTa v3 xsmall SQuAD 2.0

Microsoft reports that this model can get 84.8/82.0 on f1/em on the dev set.

I got 81.5/78.3 but I only did one run and I didn't use the official squad2 evaluation script. I will do some more runs and show the results on the official script soon.