kmok1's picture
End of training
153e5fe verified
metadata
license: mit
base_model: facebook/m2m100_1.2B
tags:
  - generated_from_trainer
metrics:
  - bleu
model-index:
  - name: cs_m2m_0.001_50_v0.2
    results: []

cs_m2m_0.001_50_v0.2

This model is a fine-tuned version of facebook/m2m100_1.2B on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 8.4343
  • Bleu: 0.0488
  • Gen Len: 93.2857

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 100

Training results

Training Loss Epoch Step Validation Loss Bleu Gen Len
5.0853 1.0 6 6.9325 0.0 5.0
4.3538 2.0 12 7.0396 0.1923 7.5714
4.6426 3.0 18 7.0321 0.1563 42.1429
5.1737 4.0 24 7.0390 0.0335 103.5238
3.9214 5.0 30 7.0585 0.0 5.0
4.7309 6.0 36 7.1597 0.1313 7.7619
4.3458 7.0 42 7.1875 0.0 5.0
4.1409 8.0 48 7.1934 0.308 18.1429
3.8187 9.0 54 7.1696 0.0 5.0
3.9459 10.0 60 7.1153 0.0 5.0
4.3563 11.0 66 7.2286 0.3581 8.619
4.4193 12.0 72 7.3526 0.0 5.0
4.4508 13.0 78 7.4000 0.0 5.0
4.115 14.0 84 7.4140 0.0 5.0
4.1807 15.0 90 7.4866 0.0 5.0
3.8422 16.0 96 7.6149 0.3839 9.0
4.1567 17.0 102 7.5413 0.2035 8.8095
4.3236 18.0 108 7.5256 0.2104 9.0
4.3343 19.0 114 7.5449 0.149 8.4286
4.3139 20.0 120 7.4758 0.0 5.0
3.1706 21.0 126 7.5896 0.0274 130.9048
3.0241 22.0 132 7.8300 0.2142 7.9524
4.5364 23.0 138 7.8698 0.0515 5.2857
5.4824 24.0 144 7.8732 0.0364 192.0952
3.8072 25.0 150 7.7993 0.0 5.0
3.9879 26.0 156 7.7222 0.0746 200.0
4.0397 27.0 162 7.6906 0.0436 146.0476
3.7429 28.0 168 7.7814 0.0 6.8095
3.7498 29.0 174 7.8873 0.2861 8.0
4.1991 30.0 180 8.0400 0.3032 13.5714
5.4424 31.0 186 7.9368 0.2537 15.1905
3.6523 32.0 192 7.8529 0.3288 7.1905
5.5908 33.0 198 7.8531 0.087 5.8571
3.8218 34.0 204 7.7538 0.2073 7.8571
3.8408 35.0 210 7.6796 0.1027 7.381
3.2347 36.0 216 7.8281 0.1662 8.9524
4.0158 37.0 222 7.8108 0.1907 23.9524
4.2395 38.0 228 7.7778 0.4592 19.4286
3.1863 39.0 234 7.8962 0.3148 16.1429
3.5706 40.0 240 8.2310 0.2962 33.7619
3.8174 41.0 246 8.0290 0.2864 14.1429
3.6144 42.0 252 7.9235 0.2737 11.8095
3.914 43.0 258 7.9920 0.286 15.5714
3.9245 44.0 264 7.9770 0.1251 35.8571
3.223 45.0 270 8.1701 0.1428 32.1429
3.5751 46.0 276 8.2573 0.2497 19.9048
3.7939 47.0 282 8.2825 0.0571 110.9524
3.8968 48.0 288 8.4263 0.0702 200.0
2.2186 49.0 294 8.3673 0.2356 107.5714
3.1794 50.0 300 8.2041 0.2142 38.5238
3.3098 51.0 306 8.2863 0.0349 113.3333
3.7869 52.0 312 8.3350 0.0655 95.2857
3.7239 53.0 318 8.2509 0.025 179.7143
3.5206 54.0 324 8.2301 0.074 75.9524
3.2225 55.0 330 8.1540 0.0242 173.5238
2.6646 56.0 336 8.1574 0.3081 91.2381
3.3487 57.0 342 8.1095 0.0597 115.6667
3.2801 58.0 348 8.1534 0.1796 39.8095
2.7653 59.0 354 8.2800 0.0423 82.0476
3.3158 60.0 360 8.2560 0.0437 116.4762
2.5549 61.0 366 8.2070 0.0348 164.2857
2.9411 62.0 372 8.2850 0.3249 12.381
2.965 63.0 378 8.3497 0.0352 117.1429
3.4553 64.0 384 8.3532 0.0739 145.9524
3.1656 65.0 390 8.3229 0.1993 102.5714
3.3285 66.0 396 8.3454 0.2297 46.9524
2.7365 67.0 402 8.4989 0.2246 39.381
3.1372 68.0 408 8.4935 0.0444 115.2381
2.3018 69.0 414 8.4543 0.0552 113.8571
2.5972 70.0 420 8.4092 0.245 15.3333
5.2476 71.0 426 8.3573 0.2629 32.0476
2.4894 72.0 432 8.3228 0.2863 42.5238
3.9303 73.0 438 8.3295 0.5382 36.7619
3.8135 74.0 444 8.3803 0.2421 41.8095
2.36 75.0 450 8.4558 0.1325 58.381
2.7095 76.0 456 8.5280 0.2592 68.9524
2.0011 77.0 462 8.4020 0.2997 58.2381
1.9209 78.0 468 8.4449 0.1838 43.7143
3.3766 79.0 474 8.5564 0.2789 24.9048
3.4283 80.0 480 8.5476 0.264 35.7143
2.8935 81.0 486 8.5057 0.0633 79.8095
2.5961 82.0 492 8.4756 0.0648 92.9524
3.999 83.0 498 8.4273 0.1558 68.4286
3.612 84.0 504 8.3825 0.1379 52.9524
2.5813 85.0 510 8.3289 0.1275 42.0
2.8265 86.0 516 8.3150 0.2806 22.9048
3.1955 87.0 522 8.3218 0.2976 17.4762
2.7654 88.0 528 8.3135 0.2878 35.619
3.7539 89.0 534 8.3157 0.0896 48.4762
1.8882 90.0 540 8.3397 0.0897 57.7619
2.5795 91.0 546 8.3700 0.069 79.1905
1.9473 92.0 552 8.4195 0.1347 152.4762
2.349 93.0 558 8.4513 0.0239 183.619
3.1561 94.0 564 8.4664 0.0234 192.4286
2.9355 95.0 570 8.4679 0.1186 167.8571
2.5661 96.0 576 8.4588 0.1833 110.9524
3.1005 97.0 582 8.4478 0.0432 124.8571
2.7184 98.0 588 8.4399 0.0589 84.9048
2.8431 99.0 594 8.4340 0.1961 103.9524
2.9269 100.0 600 8.4343 0.0488 93.2857

Framework versions

  • Transformers 4.35.2
  • Pytorch 1.13.1+cu117
  • Datasets 2.16.1
  • Tokenizers 0.15.0