--- license: mit tags: - generated_from_trainer datasets: - squad_v2 base_model: microsoft/deberta-v3-large model-index: - name: deberta-v3-large-finetuned-squadv2 results: - task: type: question-answering name: Extractive Question Answering dataset: name: SQuAD2.0 type: squad_v2 split: validation[:11873] metrics: - type: exact value: 88.69704371262529 name: eval_exact - type: f1 value: 91.51550564529175 name: eval_f1 - type: HasAns_exact value: 83.70445344129554 name: HasAns_exact - type: HasAns_f1 value: 89.34945994037624 name: HasAns_f1 - type: HasAns_total value: 5928 name: HasAns_total - type: NoAns_exact value: 93.6753574432296 name: NoAns_exact - type: NoAns_f1 value: 93.6753574432296 name: NoAns_f1 - type: NoAns_total value: 5945 name: NoAns_total --- # deberta-v3-large-finetuned-squadv2 This model is a version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) fine-tuned on the SQuAD version 2.0 dataset. Fine-tuning & evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours. ## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al. - 'EM' : 89.0 - 'F1' : 91.5 ## Results calculated with: ```python metrics = evaluate.load("squad_v2") squad_v2_metrics = metrics.compute(predictions = formatted_predictions, references = references) ``` ## for this fine-tuning: - 'exact' : 88.70, - 'f1' : 91.52, - 'total' : 11873, - 'HasAns_exact' : 83.70, - 'HasAns_f1' : 89.35, - 'HasAns_total' : 5928, - 'NoAns_exact' : 93.68, - 'NoAns_f1' : 93.68, - 'NoAns_total' : 5945, - 'best_exact' : 88.70, - 'best_exact_thresh' : 0.0, - 'best_f1' : 91.52, - 'best_f1_thresh' : 0.0} ## Model description For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa ## Intended uses Extractive question answering on a given context ### Fine-tuning hyperparameters The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning: - learning_rate : 1e-05 - train_batch_size : 8 - eval_batch_size : 8 - seed : 42 - gradient_accumulation_steps : 8 - total_train_batch_size : 64 - optimizer : Adam with betas = (0.9, 0.999) and epsilon = 1e-06 - lr_scheduler_type : linear - lr_scheduler_warmup_steps : 1000 - training_steps : 5200 ### Framework versions - Transformers : 4.35.0.dev0 - Pytorch : 2.1.0+cu121 - Datasets : 2.14.5 - Tokenizers : 0.14.0 ### System - CPU : Intel(R) Core(TM) i9-9900K - 32GB RAM - Python version : 3.11.5 [GCC 11.2.0] (64-bit runtime) - Python platform : Linux-5.15.0-86-generic-x86_64-with-glibc2.35 - GPU : NVIDIA TITAN RTX - 24GB Memory - CUDA runtime version : 12.1.105 - Nvidia driver version : 535.113.01 ### Fine-tuning (Training) results before/after the best model (Step 3620) | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 0.5323 | 1.72 | 3500 | 0.5860 | | 0.5129 | 1.73 | 3520 | 0.5656 | | 0.5441 | 1.74 | 3540 | 0.5642 | | 0.5624 | 1.75 | 3560 | 0.5873 | | 0.4645 | 1.76 | 3580 | 0.5891 | | 0.5577 | 1.77 | 3600 | 0.5816 | | 0.5199 | 1.78 | 3620 | 0.5579 | | 0.5061 | 1.79 | 3640 | 0.5837 | | 0.484 | 1.79 | 3660 | 0.5721 | | 0.5095 | 1.8 | 3680 | 0.5821 | | 0.5342 | 1.81 | 3700 | 0.5602 |