output
This model is a fine-tuned version of neuralmind/bert-base-portuguese-cased on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6440
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 42
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-06
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 10000
- num_epochs: 15.0
- mixed_precision_training: Native AMP
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
1.1985 | 0.22 | 2500 | 1.0940 |
1.0937 | 0.44 | 5000 | 1.0033 |
1.0675 | 0.66 | 7500 | 0.9753 |
1.0565 | 0.87 | 10000 | 0.9801 |
1.0244 | 1.09 | 12500 | 0.9526 |
0.9943 | 1.31 | 15000 | 0.9298 |
0.9799 | 1.53 | 17500 | 0.9035 |
0.95 | 1.75 | 20000 | 0.8835 |
0.933 | 1.97 | 22500 | 0.8636 |
0.9079 | 2.18 | 25000 | 0.8507 |
0.8938 | 2.4 | 27500 | 0.8397 |
0.8781 | 2.62 | 30000 | 0.8195 |
0.8647 | 2.84 | 32500 | 0.8088 |
0.8422 | 3.06 | 35000 | 0.7954 |
0.831 | 3.28 | 37500 | 0.7871 |
0.8173 | 3.5 | 40000 | 0.7721 |
0.8072 | 3.71 | 42500 | 0.7611 |
0.8011 | 3.93 | 45000 | 0.7532 |
0.7828 | 4.15 | 47500 | 0.7431 |
0.7691 | 4.37 | 50000 | 0.7367 |
0.7659 | 4.59 | 52500 | 0.7292 |
0.7606 | 4.81 | 55000 | 0.7245 |
0.8082 | 5.02 | 57500 | 0.7696 |
0.8114 | 5.24 | 60000 | 0.7695 |
0.8022 | 5.46 | 62500 | 0.7613 |
0.7986 | 5.68 | 65000 | 0.7558 |
0.8018 | 5.9 | 67500 | 0.7478 |
0.782 | 6.12 | 70000 | 0.7435 |
0.7743 | 6.34 | 72500 | 0.7367 |
0.774 | 6.55 | 75000 | 0.7313 |
0.7692 | 6.77 | 77500 | 0.7270 |
0.7604 | 6.99 | 80000 | 0.7200 |
0.7468 | 7.21 | 82500 | 0.7164 |
0.7486 | 7.43 | 85000 | 0.7117 |
0.7399 | 7.65 | 87500 | 0.7043 |
0.7306 | 7.86 | 90000 | 0.6956 |
0.7243 | 8.08 | 92500 | 0.6959 |
0.7132 | 8.3 | 95000 | 0.6916 |
0.71 | 8.52 | 97500 | 0.6853 |
0.7128 | 8.74 | 100000 | 0.6855 |
0.7088 | 8.96 | 102500 | 0.6809 |
0.7002 | 9.18 | 105000 | 0.6784 |
0.6953 | 9.39 | 107500 | 0.6737 |
0.695 | 9.61 | 110000 | 0.6714 |
0.6871 | 9.83 | 112500 | 0.6687 |
0.7161 | 10.05 | 115000 | 0.6961 |
0.7265 | 10.27 | 117500 | 0.7006 |
0.7284 | 10.49 | 120000 | 0.6941 |
0.724 | 10.7 | 122500 | 0.6887 |
0.7266 | 10.92 | 125000 | 0.6931 |
0.7051 | 11.14 | 127500 | 0.6846 |
0.7106 | 11.36 | 130000 | 0.6816 |
0.7011 | 11.58 | 132500 | 0.6830 |
0.6997 | 11.8 | 135000 | 0.6784 |
0.6969 | 12.02 | 137500 | 0.6734 |
0.6968 | 12.23 | 140000 | 0.6709 |
0.6867 | 12.45 | 142500 | 0.6656 |
0.6925 | 12.67 | 145000 | 0.6661 |
0.6795 | 12.89 | 147500 | 0.6606 |
0.6774 | 13.11 | 150000 | 0.6617 |
0.6756 | 13.33 | 152500 | 0.6563 |
0.6728 | 13.54 | 155000 | 0.6547 |
0.6732 | 13.76 | 157500 | 0.6520 |
0.6704 | 13.98 | 160000 | 0.6492 |
0.6666 | 14.2 | 162500 | 0.6446 |
0.6615 | 14.42 | 165000 | 0.6488 |
0.6638 | 14.64 | 167500 | 0.6523 |
0.6588 | 14.85 | 170000 | 0.6415 |
Framework versions
- Transformers 4.12.5
- Pytorch 1.10.1+cu113
- Datasets 1.17.0
- Tokenizers 0.10.3
Citing & Authors
If you use our work, please cite:
@incollection{Viegas_2023,
doi = {10.1007/978-3-031-36805-9_24},
url = {https://doi.org/10.1007%2F978-3-031-36805-9_24},
year = 2023,
publisher = {Springer Nature Switzerland},
pages = {349--365},
author = {Charles F. O. Viegas and Bruno C. Costa and Renato P. Ishii},
title = {{JurisBERT}: A New Approach that~Converts a~Classification Corpus into~an~{STS} One},
booktitle = {Computational Science and Its Applications {\textendash} {ICCSA} 2023}
}
- Downloads last month
- 291
Model tree for alfaneo/bertimbaulaw-base-portuguese-cased
Base model
neuralmind/bert-base-portuguese-cased