20230819211604
This model is a fine-tuned version of bert-large-cased on the super_glue dataset.
It achieves the following results on the evaluation set:
- Loss: 0.3362
- Accuracy: 0.7473
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.004
- train_batch_size: 8
- eval_batch_size: 8
- seed: 11
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 60.0
Training results
Training Loss |
Epoch |
Step |
Validation Loss |
Accuracy |
No log |
1.0 |
312 |
0.4002 |
0.5307 |
0.545 |
2.0 |
624 |
0.4058 |
0.5379 |
0.545 |
3.0 |
936 |
0.3972 |
0.5379 |
0.4698 |
4.0 |
1248 |
0.4360 |
0.4729 |
0.4785 |
5.0 |
1560 |
0.3494 |
0.5090 |
0.4785 |
6.0 |
1872 |
0.4100 |
0.4729 |
0.4322 |
7.0 |
2184 |
0.5717 |
0.5307 |
0.4322 |
8.0 |
2496 |
0.4078 |
0.5379 |
0.3946 |
9.0 |
2808 |
0.3304 |
0.6570 |
0.36 |
10.0 |
3120 |
0.3318 |
0.6426 |
0.36 |
11.0 |
3432 |
0.3275 |
0.6931 |
0.3478 |
12.0 |
3744 |
0.3314 |
0.7148 |
0.3359 |
13.0 |
4056 |
0.3277 |
0.7112 |
0.3359 |
14.0 |
4368 |
0.3307 |
0.7148 |
0.3249 |
15.0 |
4680 |
0.3245 |
0.6968 |
0.3249 |
16.0 |
4992 |
0.3626 |
0.6498 |
0.3253 |
17.0 |
5304 |
0.3567 |
0.6859 |
0.3155 |
18.0 |
5616 |
0.3279 |
0.7112 |
0.3155 |
19.0 |
5928 |
0.3257 |
0.7256 |
0.3145 |
20.0 |
6240 |
0.3337 |
0.7112 |
0.3051 |
21.0 |
6552 |
0.3289 |
0.7365 |
0.3051 |
22.0 |
6864 |
0.3523 |
0.6931 |
0.3015 |
23.0 |
7176 |
0.3459 |
0.7040 |
0.3015 |
24.0 |
7488 |
0.3323 |
0.7076 |
0.2952 |
25.0 |
7800 |
0.3445 |
0.7329 |
0.289 |
26.0 |
8112 |
0.3554 |
0.7329 |
0.289 |
27.0 |
8424 |
0.3210 |
0.7292 |
0.2876 |
28.0 |
8736 |
0.3204 |
0.7365 |
0.2862 |
29.0 |
9048 |
0.3374 |
0.7509 |
0.2862 |
30.0 |
9360 |
0.3778 |
0.7112 |
0.2814 |
31.0 |
9672 |
0.3352 |
0.7401 |
0.2814 |
32.0 |
9984 |
0.3251 |
0.7256 |
0.2777 |
33.0 |
10296 |
0.3574 |
0.7617 |
0.2698 |
34.0 |
10608 |
0.3330 |
0.7292 |
0.2698 |
35.0 |
10920 |
0.3388 |
0.7220 |
0.2714 |
36.0 |
11232 |
0.3222 |
0.7329 |
0.2695 |
37.0 |
11544 |
0.3482 |
0.7473 |
0.2695 |
38.0 |
11856 |
0.3447 |
0.7437 |
0.2637 |
39.0 |
12168 |
0.3394 |
0.7401 |
0.2637 |
40.0 |
12480 |
0.3264 |
0.7401 |
0.2646 |
41.0 |
12792 |
0.3311 |
0.7401 |
0.2613 |
42.0 |
13104 |
0.3322 |
0.7365 |
0.2613 |
43.0 |
13416 |
0.3411 |
0.7473 |
0.2539 |
44.0 |
13728 |
0.3298 |
0.7581 |
0.2543 |
45.0 |
14040 |
0.3442 |
0.7437 |
0.2543 |
46.0 |
14352 |
0.3399 |
0.7545 |
0.2516 |
47.0 |
14664 |
0.3330 |
0.7473 |
0.2516 |
48.0 |
14976 |
0.3299 |
0.7473 |
0.2509 |
49.0 |
15288 |
0.3407 |
0.7401 |
0.2484 |
50.0 |
15600 |
0.3268 |
0.7581 |
0.2484 |
51.0 |
15912 |
0.3386 |
0.7509 |
0.2491 |
52.0 |
16224 |
0.3323 |
0.7581 |
0.2483 |
53.0 |
16536 |
0.3448 |
0.7473 |
0.2483 |
54.0 |
16848 |
0.3339 |
0.7545 |
0.2452 |
55.0 |
17160 |
0.3343 |
0.7473 |
0.2452 |
56.0 |
17472 |
0.3408 |
0.7509 |
0.2456 |
57.0 |
17784 |
0.3374 |
0.7545 |
0.2429 |
58.0 |
18096 |
0.3360 |
0.7473 |
0.2429 |
59.0 |
18408 |
0.3345 |
0.7545 |
0.2436 |
60.0 |
18720 |
0.3362 |
0.7473 |
Framework versions
- Transformers 4.30.0
- Pytorch 2.0.1
- Datasets 2.14.4
- Tokenizers 0.13.3