ankur310794's picture
Add verifyToken field to verify evaluation results are produced by Hugging Face's automatic model evaluator (#3)
48a6d06
|
raw
history blame
2.9 kB
metadata
license: apache-2.0
datasets:
  - squad_v2
model-index:
  - name: nlpconnect/deberta-v3-xsmall-squad2
    results:
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad_v2
          type: squad_v2
          config: squad_v2
          split: validation
        metrics:
          - type: exact_match
            value: 79.3917
            name: Exact Match
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZTFiMWI5YzFlMDZhMzc2NDIwYjNiZmIyMThmOWQxYjFjZmM2ZDQ0OGM2NmNlNmI3Y2U2N2JjMmVkZTgyZjNiOCIsInZlcnNpb24iOjF9.MCw9UJ3MI3Lf5hvOgk7Lw2xZfN4678p7ebG3vnGXX_Avw6fELTPwxZ9qGA-9tL00p4NxaSb3Cx6XAFvWetAIBA
          - type: f1
            value: 82.6738
            name: F1
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiMjdiYWY2MzU4YjZhMWQzZGJhZTk3NzU3Y2UwYmQ4MzliZmQxOGUxZDllN2Y0ZmZhYjVlNTE0MzY1MjU5OWMwMCIsInZlcnNpb24iOjF9.zeWLwXy77n0YKxGA5gjySe8p-_nPQxbiPnvQU2tF45IyMmlYKUuLeq4hJnNe-5NgriTf8xkBJBE7Cr5lWHy_Cw
      - task:
          type: question-answering
          name: Question Answering
        dataset:
          name: squad
          type: squad
          config: plain_text
          split: validation
        metrics:
          - type: exact_match
            value: 84.9246
            name: Exact Match
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZGJhYmU0Y2I4Y2UyOGVlOTlkMmQ2OTcyMTZkNTkwNTMzNzhmNzZiYjU4ZDkxMGM5NzAyMjk1M2ExNGIzOWU4NCIsInZlcnNpb24iOjF9.ql1rCId6lQ7Uwq2spG3q2fFppkFGHA1IWQjvyPRhvKdRNzApBO0mu9JjMAv4uNKZX-kmGEkI018_9tAzN7kwDw
          - type: f1
            value: 91.6201
            name: F1
            verified: true
            verifyToken: >-
              eyJhbGciOiJFZERTQSIsInR5cCI6IkpXVCJ9.eyJoYXNoIjoiZjBjMmI0OTFmODVjMzllZDM0NTdmNjU4NGI4NzA4NTJhOWVkMDQ5OTY0MDcyMWEwZTFkODNlY2VhZjU2NWJmZSIsInZlcnNpb24iOjF9.rGvF60bfWIXzB66C7fkdxCtZvRZ_m3onbLaNbs7M4M0Fk27xnMat6IAy1DeTztkOKLoiD2s2NQH6wXid83cgCw

Deberta-v3-xsmall-squad2

What is SQuAD?

Stanford Question Answering Dataset (SQuAD) is a reading comprehension dataset, consisting of questions posed by crowdworkers on a set of Wikipedia articles, where the answer to every question is a segment of text, or span, from the corresponding reading passage, or the question might be unanswerable.

SQuAD2.0 combines the 100,000 questions in SQuAD1.1 with over 50,000 unanswerable questions written adversarially by crowdworkers to look similar to answerable ones. To do well on SQuAD2.0, systems must not only answer questions when possible, but also determine when no answer is supported by the paragraph and abstain from answering.

Inference


from transformers import pipeline

qa = pipeline("question-answering", model="nlpconnect/deberta-v3-xsmall-squad2")

result = qa(context="My name is Sarah and I live in London", question="Where do I live?")

Accuracy

squad_v2 = {'exact': 79.392,
   'f1': 82.674}
   
squad = {'exact': 84.925,
   'f1': 91.620}