Update README.md
Browse files
README.md
CHANGED
@@ -28,23 +28,23 @@ This model is a version of [microsoft/deberta-v3-large](https://huggingface.co/m
|
|
28 |
Fine-tuning & detailed evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.
|
29 |
|
30 |
## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
|
31 |
-
- 'EM': 89.0
|
32 |
-
- 'F1': 91.5
|
33 |
|
34 |
## Results from this fine-tuning:
|
35 |
-
- 'exact': 88.70,
|
36 |
-
- 'f1': 91.52,
|
37 |
-
- 'total': 11873,
|
38 |
-
- 'HasAns_exact': 83.70,
|
39 |
-
- 'HasAns_f1': 89.35,
|
40 |
-
- 'HasAns_total': 5928,
|
41 |
-
- 'NoAns_exact': 93.68,
|
42 |
-
- 'NoAns_f1': 93.68,
|
43 |
-
- 'NoAns_total': 5945,
|
44 |
-
- 'best_exact': 88.70,
|
45 |
-
- 'best_exact_thresh': 0.0,
|
46 |
-
- 'best_f1': 91.52,
|
47 |
-
- 'best_f1_thresh': 0.0}
|
48 |
|
49 |
## Model description
|
50 |
For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa
|
@@ -54,30 +54,30 @@ Extractive question answering on a given context
|
|
54 |
|
55 |
### Fine-tuning hyperparameters
|
56 |
The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:
|
57 |
-
- learning_rate: 1e-05
|
58 |
-
- train_batch_size: 8
|
59 |
-
- eval_batch_size: 8
|
60 |
-
- seed: 42
|
61 |
-
- gradient_accumulation_steps: 8
|
62 |
-
- total_train_batch_size: 64
|
63 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
|
64 |
-
- lr_scheduler_type: linear
|
65 |
-
- lr_scheduler_warmup_steps: 1000
|
66 |
-
- training_steps: 5200
|
67 |
|
68 |
### Framework versions
|
69 |
-
- Transformers 4.35.0.dev0
|
70 |
-
- Pytorch 2.1.0+cu121
|
71 |
-
- Datasets 2.14.5
|
72 |
-
- Tokenizers 0.14.0
|
73 |
|
74 |
### System
|
75 |
-
- CPU: Intel(R) Core(TM) i9-9900K - 32GB RAM
|
76 |
-
- Python version: 3.11.5 [GCC 11.2.0] (64-bit runtime)
|
77 |
-
- Python platform: Linux-5.15.0-86-generic-x86_64-with-glibc2.35
|
78 |
-
- GPU: NVIDIA TITAN RTX - 24GB Memory
|
79 |
-
- CUDA runtime version: 12.1.105
|
80 |
-
- Nvidia driver version: 535.113.01
|
81 |
|
82 |
### Fine-tuning (Training) results before/after the best model (Step 3620)
|
83 |
| Training Loss | Epoch | Step | Validation Loss |
|
|
|
28 |
Fine-tuning & detailed evaluation on a NVIDIA Titan RTX - 24GB GPU took 15 hours.
|
29 |
|
30 |
## Results from 2023 ICLR paper, "DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing", by Pengcheng He, et. al.
|
31 |
+
- 'EM' : 89.0
|
32 |
+
- 'F1' : 91.5
|
33 |
|
34 |
## Results from this fine-tuning:
|
35 |
+
- 'exact' : 88.70,
|
36 |
+
- 'f1' : 91.52,
|
37 |
+
- 'total' : 11873,
|
38 |
+
- 'HasAns_exact' : 83.70,
|
39 |
+
- 'HasAns_f1' : 89.35,
|
40 |
+
- 'HasAns_total' : 5928,
|
41 |
+
- 'NoAns_exact' : 93.68,
|
42 |
+
- 'NoAns_f1' : 93.68,
|
43 |
+
- 'NoAns_total' : 5945,
|
44 |
+
- 'best_exact' : 88.70,
|
45 |
+
- 'best_exact_thresh' : 0.0,
|
46 |
+
- 'best_f1' : 91.52,
|
47 |
+
- 'best_f1_thresh' : 0.0}
|
48 |
|
49 |
## Model description
|
50 |
For the authors' models, code & detailed information see: https://github.com/microsoft/DeBERTa
|
|
|
54 |
|
55 |
### Fine-tuning hyperparameters
|
56 |
The following hyperparameters, as suggested by the 2023 ICLR paper noted above, were used during fine-tuning:
|
57 |
+
- learning_rate : 1e-05
|
58 |
+
- train_batch_size : 8
|
59 |
+
- eval_batch_size : 8
|
60 |
+
- seed : 42
|
61 |
+
- gradient_accumulation_steps : 8
|
62 |
+
- total_train_batch_size : 64
|
63 |
+
- optimizer : Adam with betas = (0.9, 0.999) and epsilon = 1e-06
|
64 |
+
- lr_scheduler_type : linear
|
65 |
+
- lr_scheduler_warmup_steps : 1000
|
66 |
+
- training_steps : 5200
|
67 |
|
68 |
### Framework versions
|
69 |
+
- Transformers : 4.35.0.dev0
|
70 |
+
- Pytorch : 2.1.0+cu121
|
71 |
+
- Datasets : 2.14.5
|
72 |
+
- Tokenizers : 0.14.0
|
73 |
|
74 |
### System
|
75 |
+
- CPU : Intel(R) Core(TM) i9-9900K - 32GB RAM
|
76 |
+
- Python version : 3.11.5 [GCC 11.2.0] (64-bit runtime)
|
77 |
+
- Python platform : Linux-5.15.0-86-generic-x86_64-with-glibc2.35
|
78 |
+
- GPU : NVIDIA TITAN RTX - 24GB Memory
|
79 |
+
- CUDA runtime version : 12.1.105
|
80 |
+
- Nvidia driver version : 535.113.01
|
81 |
|
82 |
### Fine-tuning (Training) results before/after the best model (Step 3620)
|
83 |
| Training Loss | Epoch | Step | Validation Loss |
|