File size: 14,086 Bytes
1c0d190
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
---
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: dit-small_rvl_cdip_100_examples_per_class_kd_MSE_lr_fix
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# dit-small_rvl_cdip_100_examples_per_class_kd_MSE_lr_fix

This model is a fine-tuned version of [microsoft/dit-base](https://huggingface.co/microsoft/dit-base) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 1.8796
- Accuracy: 0.26
- Brier Loss: 0.8768
- Nll: 6.0962
- F1 Micro: 0.26
- F1 Macro: 0.2480
- Ece: 0.2002
- Aurc: 0.5815

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 128
- eval_batch_size: 128
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 100

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy | Brier Loss | Nll     | F1 Micro | F1 Macro | Ece    | Aurc   |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:----------:|:-------:|:--------:|:--------:|:------:|:------:|
| No log        | 1.0   | 7    | 1.5365          | 0.065    | 0.9398     | 10.2864 | 0.065    | 0.0116   | 0.1183 | 0.9536 |
| No log        | 2.0   | 14   | 1.5332          | 0.06     | 0.9374     | 9.8468  | 0.06     | 0.0269   | 0.1067 | 0.9096 |
| No log        | 3.0   | 21   | 1.5119          | 0.085    | 0.9352     | 9.1495  | 0.085    | 0.0355   | 0.1135 | 0.8759 |
| No log        | 4.0   | 28   | 1.5040          | 0.0825   | 0.9333     | 8.6549  | 0.0825   | 0.0439   | 0.1181 | 0.8618 |
| No log        | 5.0   | 35   | 1.5021          | 0.1      | 0.9301     | 8.9643  | 0.1000   | 0.0558   | 0.1318 | 0.8030 |
| No log        | 6.0   | 42   | 1.4885          | 0.1      | 0.9276     | 7.8684  | 0.1000   | 0.0505   | 0.1205 | 0.8190 |
| No log        | 7.0   | 49   | 1.4882          | 0.0975   | 0.9254     | 9.4095  | 0.0975   | 0.0584   | 0.1220 | 0.7847 |
| No log        | 8.0   | 56   | 1.4909          | 0.1275   | 0.9227     | 9.4274  | 0.1275   | 0.0827   | 0.1335 | 0.7445 |
| No log        | 9.0   | 63   | 1.4837          | 0.115    | 0.9217     | 10.2918 | 0.115    | 0.0546   | 0.1366 | 0.7932 |
| No log        | 10.0  | 70   | 1.4857          | 0.1125   | 0.9186     | 9.5039  | 0.1125   | 0.0510   | 0.1277 | 0.7749 |
| No log        | 11.0  | 77   | 1.4804          | 0.1125   | 0.9183     | 8.5178  | 0.1125   | 0.0515   | 0.1315 | 0.7831 |
| No log        | 12.0  | 84   | 1.4701          | 0.11     | 0.9177     | 8.2398  | 0.11     | 0.0655   | 0.1310 | 0.7754 |
| No log        | 13.0  | 91   | 1.4721          | 0.16     | 0.9160     | 7.2379  | 0.16     | 0.1155   | 0.1462 | 0.7370 |
| No log        | 14.0  | 98   | 1.4717          | 0.11     | 0.9159     | 8.1355  | 0.11     | 0.0633   | 0.1221 | 0.7579 |
| No log        | 15.0  | 105  | 1.4739          | 0.1325   | 0.9138     | 7.4037  | 0.1325   | 0.0790   | 0.1419 | 0.7358 |
| No log        | 16.0  | 112  | 1.4657          | 0.1425   | 0.9135     | 7.8063  | 0.1425   | 0.0821   | 0.1285 | 0.7269 |
| No log        | 17.0  | 119  | 1.4632          | 0.1375   | 0.9112     | 7.8852  | 0.1375   | 0.0948   | 0.1389 | 0.7342 |
| No log        | 18.0  | 126  | 1.4769          | 0.15     | 0.9081     | 8.5375  | 0.15     | 0.0894   | 0.1399 | 0.7113 |
| No log        | 19.0  | 133  | 1.4547          | 0.1775   | 0.9045     | 6.4114  | 0.1775   | 0.1174   | 0.1507 | 0.7007 |
| No log        | 20.0  | 140  | 1.4470          | 0.1725   | 0.9031     | 8.1696  | 0.1725   | 0.1246   | 0.1464 | 0.7079 |
| No log        | 21.0  | 147  | 1.4615          | 0.19     | 0.9021     | 6.0696  | 0.19     | 0.1390   | 0.1646 | 0.7023 |
| No log        | 22.0  | 154  | 1.4588          | 0.2      | 0.8996     | 6.0038  | 0.2000   | 0.1384   | 0.1628 | 0.6821 |
| No log        | 23.0  | 161  | 1.4646          | 0.1525   | 0.8988     | 7.0678  | 0.1525   | 0.1075   | 0.1458 | 0.7000 |
| No log        | 24.0  | 168  | 1.4491          | 0.2125   | 0.8933     | 5.9276  | 0.2125   | 0.1503   | 0.1533 | 0.6457 |
| No log        | 25.0  | 175  | 1.4526          | 0.205    | 0.8916     | 7.6108  | 0.205    | 0.1479   | 0.1603 | 0.6676 |
| No log        | 26.0  | 182  | 1.4510          | 0.17     | 0.8910     | 5.6337  | 0.17     | 0.1333   | 0.1396 | 0.6868 |
| No log        | 27.0  | 189  | 1.4567          | 0.19     | 0.8850     | 5.2038  | 0.19     | 0.1380   | 0.1637 | 0.6547 |
| No log        | 28.0  | 196  | 1.4570          | 0.2225   | 0.8846     | 6.5368  | 0.2225   | 0.1840   | 0.1701 | 0.6554 |
| No log        | 29.0  | 203  | 1.4701          | 0.2075   | 0.8820     | 5.0057  | 0.2075   | 0.1663   | 0.1719 | 0.6598 |
| No log        | 30.0  | 210  | 1.4693          | 0.2225   | 0.8755     | 7.4456  | 0.2225   | 0.1729   | 0.1626 | 0.6355 |
| No log        | 31.0  | 217  | 1.4670          | 0.23     | 0.8787     | 5.8938  | 0.23     | 0.1904   | 0.1717 | 0.6424 |
| No log        | 32.0  | 224  | 1.4540          | 0.2275   | 0.8756     | 6.6513  | 0.2275   | 0.1673   | 0.1676 | 0.6306 |
| No log        | 33.0  | 231  | 1.4641          | 0.2275   | 0.8649     | 5.5689  | 0.2275   | 0.1751   | 0.1746 | 0.6138 |
| No log        | 34.0  | 238  | 1.4710          | 0.2425   | 0.8640     | 7.0556  | 0.2425   | 0.1957   | 0.1809 | 0.6048 |
| No log        | 35.0  | 245  | 1.4685          | 0.23     | 0.8632     | 5.5735  | 0.23     | 0.1940   | 0.1609 | 0.6188 |
| No log        | 36.0  | 252  | 1.4665          | 0.2375   | 0.8592     | 5.8835  | 0.2375   | 0.1952   | 0.1727 | 0.6050 |
| No log        | 37.0  | 259  | 1.4668          | 0.235    | 0.8540     | 5.3502  | 0.235    | 0.1966   | 0.1746 | 0.6056 |
| No log        | 38.0  | 266  | 1.4855          | 0.27     | 0.8510     | 5.3781  | 0.27     | 0.2124   | 0.1692 | 0.5825 |
| No log        | 39.0  | 273  | 1.5279          | 0.265    | 0.8562     | 6.2426  | 0.265    | 0.2126   | 0.1772 | 0.5831 |
| No log        | 40.0  | 280  | 1.5433          | 0.2425   | 0.8551     | 5.9574  | 0.2425   | 0.1867   | 0.1499 | 0.5874 |
| No log        | 41.0  | 287  | 1.5955          | 0.2525   | 0.8597     | 6.1628  | 0.2525   | 0.2024   | 0.1479 | 0.5891 |
| No log        | 42.0  | 294  | 1.5528          | 0.2475   | 0.8541     | 6.3624  | 0.2475   | 0.1908   | 0.1566 | 0.5735 |
| No log        | 43.0  | 301  | 1.5858          | 0.2675   | 0.8504     | 6.1261  | 0.2675   | 0.2174   | 0.1706 | 0.5674 |
| No log        | 44.0  | 308  | 1.6013          | 0.2725   | 0.8496     | 5.8409  | 0.2725   | 0.2463   | 0.1846 | 0.5807 |
| No log        | 45.0  | 315  | 1.5632          | 0.2625   | 0.8472     | 5.9669  | 0.2625   | 0.2307   | 0.1689 | 0.5689 |
| No log        | 46.0  | 322  | 1.6520          | 0.2675   | 0.8509     | 5.8544  | 0.2675   | 0.2325   | 0.1779 | 0.5622 |
| No log        | 47.0  | 329  | 1.6135          | 0.2625   | 0.8476     | 5.5208  | 0.2625   | 0.2504   | 0.1565 | 0.5759 |
| No log        | 48.0  | 336  | 1.6565          | 0.275    | 0.8466     | 5.9254  | 0.275    | 0.2527   | 0.2026 | 0.5616 |
| No log        | 49.0  | 343  | 1.6807          | 0.2625   | 0.8531     | 6.1297  | 0.2625   | 0.2259   | 0.1813 | 0.5664 |
| No log        | 50.0  | 350  | 1.7266          | 0.255    | 0.8560     | 6.0828  | 0.255    | 0.2315   | 0.1817 | 0.5735 |
| No log        | 51.0  | 357  | 1.7038          | 0.2525   | 0.8579     | 5.6442  | 0.2525   | 0.2405   | 0.1861 | 0.5828 |
| No log        | 52.0  | 364  | 1.7954          | 0.255    | 0.8583     | 5.7016  | 0.255    | 0.2227   | 0.1722 | 0.5725 |
| No log        | 53.0  | 371  | 1.7567          | 0.275    | 0.8557     | 6.1586  | 0.275    | 0.2523   | 0.1577 | 0.5619 |
| No log        | 54.0  | 378  | 1.7589          | 0.2525   | 0.8565     | 5.3969  | 0.2525   | 0.2325   | 0.1840 | 0.5661 |
| No log        | 55.0  | 385  | 1.7778          | 0.265    | 0.8569     | 5.8559  | 0.265    | 0.2447   | 0.1835 | 0.5640 |
| No log        | 56.0  | 392  | 1.8044          | 0.275    | 0.8592     | 5.9942  | 0.275    | 0.2517   | 0.1783 | 0.5627 |
| No log        | 57.0  | 399  | 1.8327          | 0.2625   | 0.8628     | 6.0224  | 0.2625   | 0.2333   | 0.1801 | 0.5560 |
| No log        | 58.0  | 406  | 1.8184          | 0.25     | 0.8609     | 6.0769  | 0.25     | 0.2333   | 0.1941 | 0.5718 |
| No log        | 59.0  | 413  | 1.8318          | 0.2575   | 0.8639     | 5.9454  | 0.2575   | 0.2364   | 0.1965 | 0.5743 |
| No log        | 60.0  | 420  | 1.8081          | 0.2525   | 0.8641     | 6.0119  | 0.2525   | 0.2380   | 0.1818 | 0.5755 |
| No log        | 61.0  | 427  | 1.8405          | 0.2625   | 0.8775     | 6.2129  | 0.2625   | 0.2474   | 0.1767 | 0.5908 |
| No log        | 62.0  | 434  | 1.9012          | 0.2625   | 0.8728     | 6.1015  | 0.2625   | 0.2373   | 0.1881 | 0.5716 |
| No log        | 63.0  | 441  | 1.8500          | 0.26     | 0.8728     | 6.3885  | 0.26     | 0.2414   | 0.1933 | 0.5809 |
| No log        | 64.0  | 448  | 1.8771          | 0.2675   | 0.8733     | 6.2730  | 0.2675   | 0.2553   | 0.2035 | 0.5800 |
| No log        | 65.0  | 455  | 1.8744          | 0.2575   | 0.8677     | 5.9805  | 0.2575   | 0.2392   | 0.1918 | 0.5663 |
| No log        | 66.0  | 462  | 1.8366          | 0.255    | 0.8694     | 6.0073  | 0.255    | 0.2403   | 0.2048 | 0.5807 |
| No log        | 67.0  | 469  | 1.8758          | 0.2575   | 0.8743     | 6.1015  | 0.2575   | 0.2381   | 0.2071 | 0.5825 |
| No log        | 68.0  | 476  | 1.8796          | 0.2675   | 0.8711     | 5.9457  | 0.2675   | 0.2470   | 0.2100 | 0.5737 |
| No log        | 69.0  | 483  | 1.8635          | 0.2675   | 0.8721     | 5.9312  | 0.2675   | 0.2493   | 0.1788 | 0.5751 |
| No log        | 70.0  | 490  | 1.8801          | 0.2625   | 0.8710     | 5.9629  | 0.2625   | 0.2467   | 0.1974 | 0.5721 |
| No log        | 71.0  | 497  | 1.8936          | 0.26     | 0.8791     | 6.0358  | 0.26     | 0.2481   | 0.1922 | 0.5844 |
| 0.9216        | 72.0  | 504  | 1.8736          | 0.275    | 0.8715     | 6.0493  | 0.275    | 0.2569   | 0.2099 | 0.5710 |
| 0.9216        | 73.0  | 511  | 1.8784          | 0.2525   | 0.8760     | 6.1441  | 0.2525   | 0.2401   | 0.1978 | 0.5849 |
| 0.9216        | 74.0  | 518  | 1.8843          | 0.2725   | 0.8763     | 6.1948  | 0.2725   | 0.2533   | 0.2007 | 0.5801 |
| 0.9216        | 75.0  | 525  | 1.8785          | 0.2675   | 0.8784     | 5.9868  | 0.2675   | 0.2578   | 0.1975 | 0.5851 |
| 0.9216        | 76.0  | 532  | 1.8812          | 0.275    | 0.8725     | 5.9367  | 0.275    | 0.2594   | 0.2037 | 0.5744 |
| 0.9216        | 77.0  | 539  | 1.8956          | 0.27     | 0.8746     | 5.9038  | 0.27     | 0.2541   | 0.1816 | 0.5738 |
| 0.9216        | 78.0  | 546  | 1.8897          | 0.265    | 0.8802     | 5.9763  | 0.265    | 0.2493   | 0.2098 | 0.5866 |
| 0.9216        | 79.0  | 553  | 1.8728          | 0.275    | 0.8752     | 6.0806  | 0.275    | 0.2623   | 0.1874 | 0.5794 |
| 0.9216        | 80.0  | 560  | 1.8887          | 0.2725   | 0.8759     | 6.2762  | 0.2725   | 0.2520   | 0.2005 | 0.5768 |
| 0.9216        | 81.0  | 567  | 1.8987          | 0.2725   | 0.8787     | 6.2444  | 0.2725   | 0.2587   | 0.2183 | 0.5773 |
| 0.9216        | 82.0  | 574  | 1.8759          | 0.2625   | 0.8773     | 6.1643  | 0.2625   | 0.2541   | 0.1922 | 0.5805 |
| 0.9216        | 83.0  | 581  | 1.8766          | 0.27     | 0.8748     | 6.0036  | 0.27     | 0.2554   | 0.1784 | 0.5762 |
| 0.9216        | 84.0  | 588  | 1.8809          | 0.2625   | 0.8764     | 6.0488  | 0.2625   | 0.2469   | 0.2030 | 0.5833 |
| 0.9216        | 85.0  | 595  | 1.8982          | 0.26     | 0.8775     | 6.0747  | 0.26     | 0.2453   | 0.1998 | 0.5851 |
| 0.9216        | 86.0  | 602  | 1.8912          | 0.27     | 0.8798     | 6.1894  | 0.27     | 0.2566   | 0.1938 | 0.5839 |
| 0.9216        | 87.0  | 609  | 1.8847          | 0.2775   | 0.8769     | 6.2744  | 0.2775   | 0.2643   | 0.2019 | 0.5775 |
| 0.9216        | 88.0  | 616  | 1.8734          | 0.265    | 0.8741     | 6.1928  | 0.265    | 0.2526   | 0.1763 | 0.5820 |
| 0.9216        | 89.0  | 623  | 1.8760          | 0.2725   | 0.8768     | 6.0274  | 0.2725   | 0.2620   | 0.2039 | 0.5792 |
| 0.9216        | 90.0  | 630  | 1.8860          | 0.265    | 0.8771     | 6.0912  | 0.265    | 0.2518   | 0.1924 | 0.5810 |
| 0.9216        | 91.0  | 637  | 1.8865          | 0.2625   | 0.8750     | 6.2350  | 0.2625   | 0.2476   | 0.1844 | 0.5791 |
| 0.9216        | 92.0  | 644  | 1.8815          | 0.2725   | 0.8733     | 6.0962  | 0.2725   | 0.2563   | 0.2013 | 0.5721 |
| 0.9216        | 93.0  | 651  | 1.8794          | 0.27     | 0.8756     | 6.2535  | 0.27     | 0.2562   | 0.2028 | 0.5764 |
| 0.9216        | 94.0  | 658  | 1.8835          | 0.2675   | 0.8769     | 6.2039  | 0.2675   | 0.2562   | 0.1928 | 0.5773 |
| 0.9216        | 95.0  | 665  | 1.8904          | 0.27     | 0.8786     | 6.1504  | 0.27     | 0.2543   | 0.2034 | 0.5768 |
| 0.9216        | 96.0  | 672  | 1.8911          | 0.26     | 0.8788     | 6.1527  | 0.26     | 0.2465   | 0.2025 | 0.5829 |
| 0.9216        | 97.0  | 679  | 1.8871          | 0.265    | 0.8776     | 6.0994  | 0.265    | 0.2519   | 0.2126 | 0.5794 |
| 0.9216        | 98.0  | 686  | 1.8825          | 0.265    | 0.8769     | 6.1564  | 0.265    | 0.2516   | 0.1987 | 0.5776 |
| 0.9216        | 99.0  | 693  | 1.8803          | 0.2675   | 0.8766     | 6.1183  | 0.2675   | 0.2561   | 0.2095 | 0.5798 |
| 0.9216        | 100.0 | 700  | 1.8796          | 0.26     | 0.8768     | 6.0962  | 0.26     | 0.2480   | 0.2002 | 0.5815 |


### Framework versions

- Transformers 4.26.1
- Pytorch 1.13.1.post200
- Datasets 2.9.0
- Tokenizers 0.13.2