w2v-bert-grain-lg_grn_only

This model is a fine-tuned version of facebook/w2v-bert-2.0 on the None dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1053
  • Wer: 0.0336
  • Cer: 0.0113

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 16
  • eval_batch_size: 8
  • seed: 42
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 32
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 100
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss Wer Cer
0.992 1.0 482 0.1349 0.1317 0.0273
0.178 2.0 964 0.1184 0.0983 0.0213
0.12 3.0 1446 0.1047 0.0833 0.0185
0.091 4.0 1928 0.0996 0.0742 0.0175
0.0721 5.0 2410 0.0946 0.0712 0.0175
0.0593 6.0 2892 0.1013 0.0686 0.0168
0.0479 7.0 3374 0.0908 0.0614 0.0148
0.0421 8.0 3856 0.0956 0.0649 0.0159
0.0371 9.0 4338 0.1026 0.0694 0.0170
0.0328 10.0 4820 0.1046 0.0592 0.0145
0.031 11.0 5302 0.0912 0.0529 0.0134
0.0255 12.0 5784 0.0870 0.0547 0.0140
0.0224 13.0 6266 0.1073 0.0588 0.0146
0.0207 14.0 6748 0.0963 0.0493 0.0136
0.0212 15.0 7230 0.1016 0.0484 0.0149
0.0183 16.0 7712 0.0994 0.0456 0.0125
0.0185 17.0 8194 0.1107 0.0495 0.0134
0.0181 18.0 8676 0.1012 0.0482 0.0136
0.0153 19.0 9158 0.0947 0.0506 0.0140
0.0131 20.0 9640 0.0890 0.0475 0.0121
0.0113 21.0 10122 0.0884 0.0475 0.0126
0.0114 22.0 10604 0.1205 0.0597 0.0147
0.0117 23.0 11086 0.0864 0.0404 0.0111
0.0107 24.0 11568 0.0939 0.0401 0.0122
0.0094 25.0 12050 0.0997 0.0404 0.0119
0.0078 26.0 12532 0.0952 0.0399 0.0121
0.0088 27.0 13014 0.1014 0.0417 0.0116
0.0077 28.0 13496 0.0954 0.0380 0.0110
0.0072 29.0 13978 0.1035 0.0427 0.0124
0.0084 30.0 14460 0.0977 0.0401 0.0119
0.0082 31.0 14942 0.0929 0.0378 0.0117
0.0084 32.0 15424 0.0966 0.0397 0.0119
0.0055 33.0 15906 0.0967 0.0401 0.0115
0.006 34.0 16388 0.0899 0.0354 0.0107
0.006 35.0 16870 0.0954 0.0351 0.0107
0.0049 36.0 17352 0.0988 0.0484 0.0128
0.0073 37.0 17834 0.0947 0.0349 0.0107
0.0049 38.0 18316 0.0893 0.0343 0.0104
0.0036 39.0 18798 0.0909 0.0317 0.0097
0.0049 40.0 19280 0.0875 0.0328 0.0099
0.0061 41.0 19762 0.1071 0.0371 0.0114
0.0059 42.0 20244 0.0979 0.0380 0.0114
0.0043 43.0 20726 0.0914 0.0347 0.0102
0.0034 44.0 21208 0.0946 0.0321 0.0100
0.004 45.0 21690 0.0905 0.0338 0.0097
0.0038 46.0 22172 0.0967 0.0312 0.0104
0.0023 47.0 22654 0.0986 0.0336 0.0104
0.0025 48.0 23136 0.0873 0.0299 0.0095
0.0027 49.0 23618 0.1071 0.0349 0.0111
0.003 50.0 24100 0.0968 0.0293 0.0098
0.0033 51.0 24582 0.1058 0.0404 0.0120
0.0034 52.0 25064 0.1020 0.0367 0.0113
0.0031 53.0 25546 0.0950 0.0302 0.0093
0.0016 54.0 26028 0.0988 0.0315 0.0100
0.0027 55.0 26510 0.0868 0.0297 0.0096
0.003 56.0 26992 0.0955 0.0332 0.0103
0.002 57.0 27474 0.0930 0.0315 0.0102
0.0022 58.0 27956 0.1053 0.0336 0.0113

Framework versions

  • Transformers 4.46.1
  • Pytorch 2.1.0+cu118
  • Datasets 3.1.0
  • Tokenizers 0.20.1
Downloads last month
36
Safetensors
Model size
606M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Model tree for sulaimank/w2v-bert-grain-lg_GRAIN

Finetuned
(238)
this model

Collection including sulaimank/w2v-bert-grain-lg_GRAIN