--- license: mit base_model: microsoft/deberta-v3-large tags: - generated_from_trainer datasets: - squad_v2 model-index: - name: deberta-v3-large-finetuned-squadv2 results: [] --- # deberta-v3-large-finetuned-squadv2 This model is a fine-tuned version of [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) on the squad_v2 dataset. It achieves the following results on the evaluation set: - Loss: 0.5579 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 8 - eval_batch_size: 8 - seed: 42 - gradient_accumulation_steps: 8 - total_train_batch_size: 64 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06 - lr_scheduler_type: linear - lr_scheduler_warmup_steps: 1000 - training_steps: 5200 ### Training results | Training Loss | Epoch | Step | Validation Loss | |:-------------:|:-----:|:----:|:---------------:| | 0.5293 | 1.57 | 3200 | 0.5739 | | 0.5106 | 1.58 | 3220 | 0.5783 | | 0.5338 | 1.59 | 3240 | 0.5718 | | 0.5128 | 1.6 | 3260 | 0.5827 | | 0.5205 | 1.61 | 3280 | 0.6045 | | 0.5114 | 1.62 | 3300 | 0.5880 | | 0.5072 | 1.63 | 3320 | 0.5788 | | 0.5512 | 1.64 | 3340 | 0.5863 | | 0.4723 | 1.65 | 3360 | 0.5898 | | 0.5011 | 1.66 | 3380 | 0.5917 | | 0.5419 | 1.67 | 3400 | 0.6027 | | 0.5425 | 1.68 | 3420 | 0.5699 | | 0.5703 | 1.69 | 3440 | 0.5897 | | 0.4646 | 1.7 | 3460 | 0.5917 | | 0.4652 | 1.71 | 3480 | 0.5745 | | 0.5323 | 1.72 | 3500 | 0.5860 | | 0.5129 | 1.73 | 3520 | 0.5656 | | 0.5441 | 1.74 | 3540 | 0.5642 | | 0.5624 | 1.75 | 3560 | 0.5873 | | 0.4645 | 1.76 | 3580 | 0.5891 | | 0.5577 | 1.77 | 3600 | 0.5816 | | 0.5199 | 1.78 | 3620 | 0.5579 | | 0.5061 | 1.79 | 3640 | 0.5837 | | 0.484 | 1.79 | 3660 | 0.5721 | | 0.5095 | 1.8 | 3680 | 0.5821 | | 0.5342 | 1.81 | 3700 | 0.5602 | | 0.5435 | 1.82 | 3720 | 0.5911 | | 0.5288 | 1.83 | 3740 | 0.5647 | | 0.5476 | 1.84 | 3760 | 0.5733 | | 0.5199 | 1.85 | 3780 | 0.5675 | | 0.5067 | 1.86 | 3800 | 0.5839 | | 0.5418 | 1.87 | 3820 | 0.5757 | | 0.4965 | 1.88 | 3840 | 0.5764 | | 0.5273 | 1.89 | 3860 | 0.5906 | | 0.5808 | 1.9 | 3880 | 0.5762 | | 0.5161 | 1.91 | 3900 | 0.5612 | | 0.4863 | 1.92 | 3920 | 0.5804 | | 0.4827 | 1.93 | 3940 | 0.5841 | | 0.4643 | 1.94 | 3960 | 0.5822 | | 0.5029 | 1.95 | 3980 | 0.6052 | | 0.509 | 1.96 | 4000 | 0.5800 | | 0.5382 | 1.97 | 4020 | 0.5645 | | 0.469 | 1.98 | 4040 | 0.5685 | | 0.5032 | 1.99 | 4060 | 0.5779 | | 0.5171 | 2.0 | 4080 | 0.5686 | | 0.3938 | 2.01 | 4100 | 0.5889 | | 0.4321 | 2.02 | 4120 | 0.6039 | | 0.4185 | 2.03 | 4140 | 0.5996 | | 0.4782 | 2.04 | 4160 | 0.5800 | | 0.424 | 2.05 | 4180 | 0.6374 | | 0.3766 | 2.06 | 4200 | 0.6096 | | 0.415 | 2.07 | 4220 | 0.6221 | | 0.4352 | 2.08 | 4240 | 0.6150 | | 0.4336 | 2.09 | 4260 | 0.6055 | | 0.4289 | 2.1 | 4280 | 0.6138 | | 0.4433 | 2.11 | 4300 | 0.5946 | | 0.4478 | 2.12 | 4320 | 0.6118 | | 0.4787 | 2.13 | 4340 | 0.5969 | | 0.4432 | 2.14 | 4360 | 0.6048 | | 0.4319 | 2.15 | 4380 | 0.5948 | | 0.3939 | 2.16 | 4400 | 0.6116 | | 0.3921 | 2.17 | 4420 | 0.6082 | | 0.4381 | 2.18 | 4440 | 0.6282 | | 0.4461 | 2.19 | 4460 | 0.6084 | | 0.4012 | 2.2 | 4480 | 0.6092 | | 0.3849 | 2.21 | 4500 | 0.6152 | | 0.4178 | 2.22 | 4520 | 0.6004 | | 0.4163 | 2.23 | 4540 | 0.6059 | | 0.4006 | 2.24 | 4560 | 0.6115 | | 0.4225 | 2.25 | 4580 | 0.6130 | | 0.4008 | 2.26 | 4600 | 0.6095 | | 0.4706 | 2.27 | 4620 | 0.6136 | | 0.3902 | 2.28 | 4640 | 0.6103 | | 0.4048 | 2.29 | 4660 | 0.6085 | | 0.4411 | 2.3 | 4680 | 0.6139 | | 0.403 | 2.31 | 4700 | 0.6047 | | 0.4799 | 2.31 | 4720 | 0.6043 | | 0.4316 | 2.32 | 4740 | 0.5960 | | 0.4198 | 2.33 | 4760 | 0.6031 | | 0.4254 | 2.34 | 4780 | 0.6033 | | 0.387 | 2.35 | 4800 | 0.6120 | | 0.3882 | 2.36 | 4820 | 0.6128 | | 0.4307 | 2.37 | 4840 | 0.6150 | | 0.434 | 2.38 | 4860 | 0.6077 | | 0.4225 | 2.39 | 4880 | 0.6071 | | 0.4134 | 2.4 | 4900 | 0.6036 | | 0.3846 | 2.41 | 4920 | 0.6124 | | 0.3943 | 2.42 | 4940 | 0.6291 | | 0.4455 | 2.43 | 4960 | 0.6185 | | 0.4104 | 2.44 | 4980 | 0.6064 | | 0.4158 | 2.45 | 5000 | 0.6095 | | 0.4135 | 2.46 | 5020 | 0.6155 | | 0.3789 | 2.47 | 5040 | 0.6209 | | 0.418 | 2.48 | 5060 | 0.6106 | | 0.3931 | 2.49 | 5080 | 0.6047 | | 0.4289 | 2.5 | 5100 | 0.6055 | | 0.4051 | 2.51 | 5120 | 0.6084 | | 0.4217 | 2.52 | 5140 | 0.6118 | | 0.3843 | 2.53 | 5160 | 0.6139 | | 0.4435 | 2.54 | 5180 | 0.6126 | | 0.4274 | 2.55 | 5200 | 0.6120 | ### Framework versions - Transformers 4.35.0.dev0 - Pytorch 2.1.0+cu121 - Datasets 2.14.5 - Tokenizers 0.14.0