---
license: mit
tags:
- generated_from_trainer
datasets:
- squad_v2
base_model: microsoft/deberta-v3-large
model-index:
- name: deberta-v3-large-finetuned-squadv2
  results:
  - task:
      type: question-answering
      name: Extractive Question Answering
    dataset:
      name: SQuAD2.0
      type: squad_v2
      split: validation[:11873]
    metrics:
    - type: exact
      value: 88.69704371262529
      name: eval_exact
    - type: f1
      value: 91.51550564529175
      name: eval_f1
    - type: HasAns_exact
      value: 83.70445344129554
      name: HasAns_exact
    - type: HasAns_f1
      value: 89.34945994037624
      name: HasAns_f1
    - type: HasAns_total
      value: 5928
      name: HasAns_total
    - type: NoAns_exact
      value: 93.6753574432296
      name: NoAns_exact
    - type: NoAns_f1
      value: 93.6753574432296
      name: NoAns_f1
    - type: NoAns_total
      value: 5945
      name: NoAns_total
---
# deberta-v3-large-finetuned-squadv2
This model is a version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) fine-tuned on the SQuAD version 2.0 dataset.
Fine-tuning & evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.

##  Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
- 'EM' : 89.0
- 'F1' : 91.5

## Results calculated with:
```python
metrics = evaluate.load("squad_v2")
squad_v2_metrics = metrics.compute(predictions = formatted_predictions, references = references)
```
## for this fine-tuning:
- 'exact' : 88.70,
- 'f1' : 91.52, 
- 'total' : 11873,
- 'HasAns_exact' : 83.70,
- 'HasAns_f1' : 89.35,
- 'HasAns_total' : 5928,
- 'NoAns_exact' : 93.68,
- 'NoAns_f1' : 93.68,
- 'NoAns_total' : 5945,
- 'best_exact' : 88.70,
- 'best_exact_thresh' : 0.0,
- 'best_f1' : 91.52,
- 'best_f1_thresh' : 0.0}

## Model description
For the authors' models, code & detailed information see:  https://github.com/microsoft/DeBERTa

## Intended uses
Extractive question answering on a given context

### Fine-tuning hyperparameters
The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:
- learning_rate : 1e-05
- train_batch_size : 8
- eval_batch_size : 8
- seed : 42
- gradient_accumulation_steps : 8
- total_train_batch_size : 64
- optimizer : Adam with betas = (0.9, 0.999) and epsilon = 1e-06
- lr_scheduler_type : linear
- lr_scheduler_warmup_steps : 1000
- training_steps : 5200

### Framework versions
- Transformers : 4.35.0.dev0
- Pytorch : 2.1.0+cu121
- Datasets : 2.14.5
- Tokenizers : 0.14.0

### System
- CPU : Intel(R) Core(TM) i9-9900K - 32GB RAM
- Python version : 3.11.5 [GCC 11.2.0] (64-bit runtime)
- Python platform : Linux-5.15.0-86-generic-x86_64-with-glibc2.35
- GPU : NVIDIA TITAN RTX - 24GB Memory
- CUDA runtime version : 12.1.105
- Nvidia driver version : 535.113.01

### Fine-tuning (Training) results before/after the best model (Step 3620)
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.5323        | 1.72  | 3500 | 0.5860          |
| 0.5129        | 1.73  | 3520 | 0.5656          |
| 0.5441        | 1.74  | 3540 | 0.5642          |
| 0.5624        | 1.75  | 3560 | 0.5873          |
| 0.4645        | 1.76  | 3580 | 0.5891          |
| 0.5577        | 1.77  | 3600 | 0.5816          |
| 0.5199        | 1.78  | 3620 | 0.5579          |
| 0.5061        | 1.79  | 3640 | 0.5837          |
| 0.484         | 1.79  | 3660 | 0.5721          |
| 0.5095        | 1.8   | 3680 | 0.5821          |
| 0.5342        | 1.81  | 3700 | 0.5602          |