File size: 4,713 Bytes
0db2890
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
---
tags:
- generated_from_trainer
metrics:
- accuracy
model-index:
- name: dit-tiny_rvl_cdip_100_examples_per_class_kd_MSE_lr_fix
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# dit-tiny_rvl_cdip_100_examples_per_class_kd_MSE_lr_fix

This model is a fine-tuned version of [microsoft/dit-base](https://huggingface.co/microsoft/dit-base) on the None dataset.
It achieves the following results on the evaluation set:
- Loss: 1.4358
- Accuracy: 0.195
- Brier Loss: 0.9035
- Nll: 12.0550
- F1 Micro: 0.195
- F1 Macro: 0.1471
- Ece: 0.1675
- Aurc: 0.6988

## Model description

More information needed

## Intended uses & limitations

More information needed

## Training and evaluation data

More information needed

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 25

### Training results

| Training Loss | Epoch | Step | Validation Loss | Accuracy | Brier Loss | Nll     | F1 Micro | F1 Macro | Ece    | Aurc   |
|:-------------:|:-----:|:----:|:---------------:|:--------:|:----------:|:-------:|:--------:|:--------:|:------:|:------:|
| No log        | 1.0   | 25   | 1.5167          | 0.07     | 0.9368     | 20.8948 | 0.07     | 0.0305   | 0.1106 | 0.8850 |
| No log        | 2.0   | 50   | 1.5246          | 0.08     | 0.9362     | 21.4368 | 0.08     | 0.0346   | 0.1200 | 0.8659 |
| No log        | 3.0   | 75   | 1.5053          | 0.1      | 0.9340     | 23.7241 | 0.1000   | 0.0522   | 0.1280 | 0.8087 |
| No log        | 4.0   | 100  | 1.5097          | 0.0975   | 0.9322     | 17.3004 | 0.0975   | 0.0487   | 0.1220 | 0.8220 |
| No log        | 5.0   | 125  | 1.4926          | 0.12     | 0.9296     | 16.3893 | 0.12     | 0.0600   | 0.1284 | 0.7752 |
| No log        | 6.0   | 150  | 1.4838          | 0.105    | 0.9273     | 19.3692 | 0.1050   | 0.0356   | 0.1254 | 0.7955 |
| No log        | 7.0   | 175  | 1.4729          | 0.0975   | 0.9229     | 18.6899 | 0.0975   | 0.0411   | 0.1134 | 0.7963 |
| No log        | 8.0   | 200  | 1.4754          | 0.125    | 0.9196     | 17.7842 | 0.125    | 0.0676   | 0.1238 | 0.7778 |
| No log        | 9.0   | 225  | 1.4725          | 0.1125   | 0.9193     | 16.6572 | 0.1125   | 0.0505   | 0.1254 | 0.7839 |
| No log        | 10.0  | 250  | 1.4702          | 0.1175   | 0.9168     | 16.3975 | 0.1175   | 0.0556   | 0.1183 | 0.7638 |
| No log        | 11.0  | 275  | 1.4648          | 0.1175   | 0.9169     | 18.4274 | 0.1175   | 0.0558   | 0.1219 | 0.7806 |
| No log        | 12.0  | 300  | 1.4660          | 0.155    | 0.9166     | 15.6492 | 0.155    | 0.0791   | 0.1411 | 0.7512 |
| No log        | 13.0  | 325  | 1.4684          | 0.16     | 0.9164     | 17.1698 | 0.16     | 0.1140   | 0.1519 | 0.7285 |
| No log        | 14.0  | 350  | 1.4662          | 0.1175   | 0.9158     | 17.6999 | 0.1175   | 0.0501   | 0.1269 | 0.7637 |
| No log        | 15.0  | 375  | 1.4602          | 0.1675   | 0.9143     | 13.2540 | 0.1675   | 0.1153   | 0.1515 | 0.7223 |
| No log        | 16.0  | 400  | 1.4556          | 0.1325   | 0.9138     | 13.3868 | 0.1325   | 0.0881   | 0.1323 | 0.7558 |
| No log        | 17.0  | 425  | 1.4527          | 0.175    | 0.9128     | 11.1983 | 0.175    | 0.1334   | 0.1596 | 0.7153 |
| No log        | 18.0  | 450  | 1.4535          | 0.1625   | 0.9111     | 17.6046 | 0.1625   | 0.1021   | 0.1435 | 0.7379 |
| No log        | 19.0  | 475  | 1.4453          | 0.1825   | 0.9086     | 11.8948 | 0.1825   | 0.1228   | 0.1594 | 0.7098 |
| 1.4614        | 20.0  | 500  | 1.4431          | 0.1525   | 0.9078     | 14.2631 | 0.1525   | 0.1115   | 0.1410 | 0.7293 |
| 1.4614        | 21.0  | 525  | 1.4392          | 0.1825   | 0.9063     | 10.7664 | 0.1825   | 0.1378   | 0.1567 | 0.7058 |
| 1.4614        | 22.0  | 550  | 1.4469          | 0.1775   | 0.9055     | 13.4724 | 0.1775   | 0.1212   | 0.1483 | 0.7107 |
| 1.4614        | 23.0  | 575  | 1.4356          | 0.17     | 0.9039     | 11.8141 | 0.17     | 0.1232   | 0.1515 | 0.7091 |
| 1.4614        | 24.0  | 600  | 1.4370          | 0.1875   | 0.9039     | 12.9338 | 0.1875   | 0.1384   | 0.1539 | 0.7017 |
| 1.4614        | 25.0  | 625  | 1.4358          | 0.195    | 0.9035     | 12.0550 | 0.195    | 0.1471   | 0.1675 | 0.6988 |


### Framework versions

- Transformers 4.28.0.dev0
- Pytorch 1.12.1+cu113
- Datasets 2.12.0
- Tokenizers 0.12.1