Edit model card

pretrain_custom_tokenizer

This model was trained from scratch on the code_search_net dataset. It achieves the following results on the evaluation set:

  • Loss: 2.9012
  • Bleu: 0.0437
  • Precisions: [0.17073810819731178, 0.05349823043007888, 0.026839997681805762, 0.014878806668698525]
  • Brevity Penalty: 1.0
  • Length Ratio: 1.8881
  • Translation Length: 1493697
  • Reference Length: 791127

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Bleu Brevity Penalty Length Ratio Validation Loss Precisions Reference Length Translation Length
3.87 1.0 25762 0.0330 1.0 1.9706 3.7713 [0.13936095014631156, 0.040556966110672256, 0.01980787640709075, 0.010544173702481034] 791127 1559002
3.6512 2.0 51524 0.0296 1.0 2.1794 3.5106 [0.12352037416285759, 0.03645373394602681, 0.01784718216329517, 0.009590691394861035] 791127 1724169
3.5043 3.0 77286 0.0366 1.0 2.0612 3.3681 [0.14769295867930776, 0.04478064762804751, 0.022321614312460995, 0.01211843395693241] 791127 1630660
3.3524 4.0 103048 0.0373 1.0 1.9870 3.2651 [0.15228345344701566, 0.04586568142634442, 0.02256759313264377, 0.012219488476642417] 791127 1571983
3.2746 5.0 128810 0.0384 1.0 2.0390 3.1935 [0.1523531796659091, 0.04687264288515885, 0.023561239014433664, 0.012935446265137207] 791127 1613094
3.2305 6.0 154572 0.0387 1.0 1.9700 3.1368 [0.1567301269740848, 0.047534152418592664, 0.023522792038785066, 0.012842802012275794] 791127 1558507
3.1199 7.0 180334 0.0406 1.0 1.9295 3.0924 [0.16104313669485146, 0.0497667381497795, 0.02487902888463687, 0.013686010776483513] 791127 1526473
3.1476 8.0 206096 0.0416 1.0 1.9408 3.0537 [0.16303145796074886, 0.050591823896046224, 0.025582405968043283, 0.014183117767188563] 791127 1535446
3.031 9.0 231858 0.0424 1.0 1.8818 3.0262 [0.16684712738332408, 0.051844235468668176, 0.026003093150347323, 0.01442481092789985] 791127 1488782
3.0243 10.0 257620 0.0420 1.0 1.8859 3.0003 [0.16607697592198523, 0.05141761221771236, 0.025742386869365745, 0.014193381846444359] 791127 1492025
3.0343 11.0 283382 0.0428 1.0 1.8886 2.9777 [0.1691752170217323, 0.052193166007217705, 0.026141681013552288, 0.014484642594473435] 791127 1494090
2.9652 12.0 309144 0.0428 1.0 1.9005 2.9615 [0.16823973933395542, 0.052320879613441, 0.026187658344737696, 0.014502132420939685] 791127 1503533
2.9981 13.0 334906 0.0437 1.0 1.8706 2.9445 [0.16985697826461554, 0.05332124116669285, 0.02686760130903802, 0.014972828451699309] 791127 1479845
2.941 14.0 360668 0.0432 1.0 1.8655 2.9335 [0.17029332390165305, 0.052870421070299455, 0.02642667143527729, 0.014670292675070435] 791127 1475877
2.8816 15.0 386430 0.0437 1.0 1.8631 2.9228 [0.1712148556231003, 0.053400515923720436, 0.026818846008300055, 0.014919424168060396] 791127 1473920
2.9124 16.0 412192 0.0435 1.0 1.8775 2.9150 [0.17018986135899558, 0.05329713851829526, 0.02675404721408581, 0.014813441829455485] 791127 1485347
2.9019 17.0 437954 0.0433 1.0 1.8899 2.9091 [0.17013412808717324, 0.053090832561091934, 0.026579011940635933, 0.014667138889511053] 791127 1495138
2.8737 18.0 463716 0.0438 1.0 1.8892 2.9044 [0.17056889528815428, 0.05354494235512719, 0.026930478436931807, 0.014909059846295177] 791127 1494616
2.9192 19.0 489478 0.0439 1.0 1.8837 2.9019 [0.1710080779910644, 0.05364458098047096, 0.02695562446920049, 0.014976875182171591] 791127 1490222
2.8501 20.0 515240 2.9012 0.0437 [0.17073810819731178, 0.05349823043007888, 0.026839997681805762, 0.014878806668698525] 1.0 1.8881 1493697 791127

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.2
Downloads last month
22
Safetensors
Model size
223M params
Tensor type
F32
·

Dataset used to train sc20fg/pretrain_custom_tokenizer

Evaluation results