categorization-finetuned-20220721-164940-distilled-20220811-013354
This model is a fine-tuned version of carted-nlp/categorization-finetuned-20220721-164940 on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.0645
- Accuracy: 0.8776
- F1: 0.8768
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 64
- eval_batch_size: 96
- seed: 314
- gradient_accumulation_steps: 4
- total_train_batch_size: 256
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 1500
- num_epochs: 30.0
Training results
Training Loss | Epoch | Step | Validation Loss | Accuracy | F1 |
---|---|---|---|---|---|
0.2702 | 0.56 | 2500 | 0.1290 | 0.7832 | 0.7783 |
0.1246 | 1.12 | 5000 | 0.1047 | 0.8169 | 0.8137 |
0.1066 | 1.69 | 7500 | 0.0945 | 0.8301 | 0.8276 |
0.0975 | 2.25 | 10000 | 0.0888 | 0.8386 | 0.8367 |
0.0917 | 2.81 | 12500 | 0.0849 | 0.8445 | 0.8428 |
0.0865 | 3.37 | 15000 | 0.0818 | 0.8496 | 0.8484 |
0.0835 | 3.94 | 17500 | 0.0796 | 0.8526 | 0.8509 |
0.08 | 4.5 | 20000 | 0.0777 | 0.8552 | 0.8542 |
0.0778 | 5.06 | 22500 | 0.0763 | 0.8580 | 0.8567 |
0.0753 | 5.62 | 25000 | 0.0744 | 0.8604 | 0.8592 |
0.0739 | 6.19 | 27500 | 0.0738 | 0.8614 | 0.8603 |
0.0716 | 6.75 | 30000 | 0.0729 | 0.8630 | 0.8620 |
0.0701 | 7.31 | 32500 | 0.0719 | 0.8645 | 0.8638 |
0.0689 | 7.87 | 35000 | 0.0708 | 0.8657 | 0.8647 |
0.067 | 8.43 | 37500 | 0.0705 | 0.8671 | 0.8660 |
0.0669 | 9.0 | 40000 | 0.0699 | 0.8681 | 0.8674 |
0.0647 | 9.56 | 42500 | 0.0697 | 0.8683 | 0.8673 |
0.0641 | 10.12 | 45000 | 0.0693 | 0.8691 | 0.8681 |
0.063 | 10.68 | 47500 | 0.0685 | 0.8702 | 0.8694 |
0.0618 | 11.25 | 50000 | 0.0681 | 0.8709 | 0.8701 |
0.0614 | 11.81 | 52500 | 0.0675 | 0.8720 | 0.8712 |
0.0601 | 12.37 | 55000 | 0.0678 | 0.8724 | 0.8713 |
0.0598 | 12.93 | 57500 | 0.0670 | 0.8732 | 0.8725 |
0.0584 | 13.5 | 60000 | 0.0670 | 0.8732 | 0.8723 |
0.0584 | 14.06 | 62500 | 0.0665 | 0.8740 | 0.8732 |
0.0572 | 14.62 | 65000 | 0.0665 | 0.8744 | 0.8734 |
0.0567 | 15.18 | 67500 | 0.0661 | 0.8753 | 0.8745 |
0.0561 | 15.74 | 70000 | 0.0660 | 0.8756 | 0.8750 |
0.0554 | 16.31 | 72500 | 0.0661 | 0.8759 | 0.8751 |
0.0552 | 16.87 | 75000 | 0.0656 | 0.8755 | 0.8749 |
0.0544 | 17.43 | 77500 | 0.0657 | 0.8762 | 0.8754 |
0.0544 | 17.99 | 80000 | 0.0654 | 0.8767 | 0.8760 |
0.0534 | 18.56 | 82500 | 0.0654 | 0.8767 | 0.8759 |
0.0534 | 19.12 | 85000 | 0.0653 | 0.8773 | 0.8767 |
0.0528 | 19.68 | 87500 | 0.0649 | 0.8775 | 0.8768 |
0.0525 | 20.24 | 90000 | 0.0651 | 0.8776 | 0.8769 |
0.0523 | 20.81 | 92500 | 0.0649 | 0.8775 | 0.8768 |
0.0517 | 21.37 | 95000 | 0.0648 | 0.8782 | 0.8775 |
0.0516 | 21.93 | 97500 | 0.0648 | 0.8783 | 0.8776 |
0.0511 | 22.49 | 100000 | 0.0648 | 0.8781 | 0.8774 |
0.0511 | 23.05 | 102500 | 0.0647 | 0.8783 | 0.8776 |
0.0508 | 23.62 | 105000 | 0.0647 | 0.8785 | 0.8778 |
0.0505 | 24.18 | 107500 | 0.0647 | 0.8785 | 0.8777 |
0.0505 | 24.74 | 110000 | 0.0646 | 0.8788 | 0.8781 |
0.0503 | 25.3 | 112500 | 0.0646 | 0.8786 | 0.8779 |
0.0502 | 25.87 | 115000 | 0.0646 | 0.8789 | 0.8782 |
0.0501 | 26.43 | 117500 | 0.0646 | 0.8788 | 0.8781 |
0.0501 | 26.99 | 120000 | 0.0645 | 0.8791 | 0.8784 |
0.05 | 27.55 | 122500 | 0.0646 | 0.8790 | 0.8783 |
0.0497 | 28.12 | 125000 | 0.0645 | 0.8792 | 0.8785 |
0.0499 | 28.68 | 127500 | 0.0645 | 0.8791 | 0.8784 |
0.0499 | 29.24 | 130000 | 0.0645 | 0.8792 | 0.8785 |
0.0497 | 29.8 | 132500 | 0.0645 | 0.8791 | 0.8784 |
Framework versions
- Transformers 4.17.0
- Pytorch 1.11.0+cu113
- Datasets 2.3.2
- Tokenizers 0.11.6
- Downloads last month
- 6
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.