jordyvl
/

dit-tiny_rvl_cdip_100_examples_per_class_kd_MSE_lr_fix

+---
+tags:
+- generated_from_trainer
+metrics:
+- accuracy
+model-index:
+- name: dit-tiny_rvl_cdip_100_examples_per_class_kd_MSE_lr_fix
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# dit-tiny_rvl_cdip_100_examples_per_class_kd_MSE_lr_fix
+This model is a fine-tuned version of [microsoft/dit-base](https://huggingface.co/microsoft/dit-base) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 1.4358
+- Accuracy: 0.195
+- Brier Loss: 0.9035
+- Nll: 12.0550
+- F1 Micro: 0.195
+- F1 Macro: 0.1471
+- Ece: 0.1675
+- Aurc: 0.6988
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0001
+- train_batch_size: 32
+- eval_batch_size: 32
+- seed: 42
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: linear
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 25
+### Training results
+| Training Loss | Epoch | Step | Validation Loss | Accuracy | Brier Loss | Nll     | F1 Micro | F1 Macro | Ece    | Aurc   |
+|:-------------:|:-----:|:----:|:---------------:|:--------:|:----------:|:-------:|:--------:|:--------:|:------:|:------:|
+| No log        | 1.0   | 25   | 1.5167          | 0.07     | 0.9368     | 20.8948 | 0.07     | 0.0305   | 0.1106 | 0.8850 |
+| No log        | 2.0   | 50   | 1.5246          | 0.08     | 0.9362     | 21.4368 | 0.08     | 0.0346   | 0.1200 | 0.8659 |
+| No log        | 3.0   | 75   | 1.5053          | 0.1      | 0.9340     | 23.7241 | 0.1000   | 0.0522   | 0.1280 | 0.8087 |
+| No log        | 4.0   | 100  | 1.5097          | 0.0975   | 0.9322     | 17.3004 | 0.0975   | 0.0487   | 0.1220 | 0.8220 |
+| No log        | 5.0   | 125  | 1.4926          | 0.12     | 0.9296     | 16.3893 | 0.12     | 0.0600   | 0.1284 | 0.7752 |
+| No log        | 6.0   | 150  | 1.4838          | 0.105    | 0.9273     | 19.3692 | 0.1050   | 0.0356   | 0.1254 | 0.7955 |
+| No log        | 7.0   | 175  | 1.4729          | 0.0975   | 0.9229     | 18.6899 | 0.0975   | 0.0411   | 0.1134 | 0.7963 |
+| No log        | 8.0   | 200  | 1.4754          | 0.125    | 0.9196     | 17.7842 | 0.125    | 0.0676   | 0.1238 | 0.7778 |
+| No log        | 9.0   | 225  | 1.4725          | 0.1125   | 0.9193     | 16.6572 | 0.1125   | 0.0505   | 0.1254 | 0.7839 |
+| No log        | 10.0  | 250  | 1.4702          | 0.1175   | 0.9168     | 16.3975 | 0.1175   | 0.0556   | 0.1183 | 0.7638 |
+| No log        | 11.0  | 275  | 1.4648          | 0.1175   | 0.9169     | 18.4274 | 0.1175   | 0.0558   | 0.1219 | 0.7806 |
+| No log        | 12.0  | 300  | 1.4660          | 0.155    | 0.9166     | 15.6492 | 0.155    | 0.0791   | 0.1411 | 0.7512 |
+| No log        | 13.0  | 325  | 1.4684          | 0.16     | 0.9164     | 17.1698 | 0.16     | 0.1140   | 0.1519 | 0.7285 |
+| No log        | 14.0  | 350  | 1.4662          | 0.1175   | 0.9158     | 17.6999 | 0.1175   | 0.0501   | 0.1269 | 0.7637 |
+| No log        | 15.0  | 375  | 1.4602          | 0.1675   | 0.9143     | 13.2540 | 0.1675   | 0.1153   | 0.1515 | 0.7223 |
+| No log        | 16.0  | 400  | 1.4556          | 0.1325   | 0.9138     | 13.3868 | 0.1325   | 0.0881   | 0.1323 | 0.7558 |
+| No log        | 17.0  | 425  | 1.4527          | 0.175    | 0.9128     | 11.1983 | 0.175    | 0.1334   | 0.1596 | 0.7153 |
+| No log        | 18.0  | 450  | 1.4535          | 0.1625   | 0.9111     | 17.6046 | 0.1625   | 0.1021   | 0.1435 | 0.7379 |
+| No log        | 19.0  | 475  | 1.4453          | 0.1825   | 0.9086     | 11.8948 | 0.1825   | 0.1228   | 0.1594 | 0.7098 |
+| 1.4614        | 20.0  | 500  | 1.4431          | 0.1525   | 0.9078     | 14.2631 | 0.1525   | 0.1115   | 0.1410 | 0.7293 |
+| 1.4614        | 21.0  | 525  | 1.4392          | 0.1825   | 0.9063     | 10.7664 | 0.1825   | 0.1378   | 0.1567 | 0.7058 |
+| 1.4614        | 22.0  | 550  | 1.4469          | 0.1775   | 0.9055     | 13.4724 | 0.1775   | 0.1212   | 0.1483 | 0.7107 |
+| 1.4614        | 23.0  | 575  | 1.4356          | 0.17     | 0.9039     | 11.8141 | 0.17     | 0.1232   | 0.1515 | 0.7091 |
+| 1.4614        | 24.0  | 600  | 1.4370          | 0.1875   | 0.9039     | 12.9338 | 0.1875   | 0.1384   | 0.1539 | 0.7017 |
+| 1.4614        | 25.0  | 625  | 1.4358          | 0.195    | 0.9035     | 12.0550 | 0.195    | 0.1471   | 0.1675 | 0.6988 |
+### Framework versions
+- Transformers 4.28.0.dev0
+- Pytorch 1.12.1+cu113
+- Datasets 2.12.0
+- Tokenizers 0.12.1