Edit model card

pretrain_base_tokenizer

This model was trained from scratch on the code_search_net dataset. It achieves the following results on the evaluation set:

  • Loss: 2.1008
  • Bleu: 0.0745
  • Precisions: [0.370227852188713, 0.13803247473556413, 0.07398987834019316, 0.04421999242711094]
  • Brevity Penalty: 0.6551
  • Length Ratio: 0.7028
  • Translation Length: 594596
  • Reference Length: 846059

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Bleu Brevity Penalty Length Ratio Validation Loss Precisions Reference Length Translation Length
2.4112 1.0 25762 0.0658 0.6617 0.7078 2.3310 [0.35249982048117723, 0.12327087650542343, 0.06319307060145318, 0.03566263068721848] 846059 598823
2.3334 2.0 51524 0.0681 0.6617 0.7078 2.2582 [0.35832782242172834, 0.127192419612726, 0.06572103592555187, 0.03742664572798245] 846059 598812
2.2441 3.0 77286 0.0696 0.6529 0.7011 2.2180 [0.36256557741844125, 0.13175932444407085, 0.06865854925017445, 0.039288874037902925] 846059 593192
2.1798 4.0 103048 0.0721 0.6729 0.7162 2.1907 [0.36450348611741407, 0.13225409723744794, 0.06903673685953533, 0.0396539707725045] 846059 605975
2.1424 5.0 128810 0.0715 0.6561 0.7035 2.1736 [0.3668020289217111, 0.13425532471838544, 0.07022789074894986, 0.04068744822577534] 846059 595193
2.1132 6.0 154572 0.0739 0.6875 0.7275 2.1539 [0.36025300866163096, 0.13232255476642318, 0.06955911290379053, 0.040195441044440436] 846059 615473
2.0984 7.0 180334 0.0721 0.6587 0.7055 2.1471 [0.36612131721578584, 0.13431329561035363, 0.0708157263719857, 0.041272288902252124] 846059 596865
2.0785 8.0 206096 0.0724 0.6756 0.7183 2.1353 [0.36380808213768595, 0.13209779987841613, 0.06888583628832168, 0.039797612956022015] 846059 607760
2.044 9.0 231858 0.0651 0.5890 0.6539 2.1307 [0.3747329223983659, 0.13597984423722942, 0.07083334152049311, 0.041232475735633954] 846059 553210
2.0022 10.0 257620 0.0678 0.6182 0.6752 2.1244 [0.37057115300122706, 0.13501863826827087, 0.07054691458053057, 0.04094138244503552] 846059 571283
2.0115 11.0 283382 0.0714 0.6437 0.6942 2.1181 [0.3696569336851962, 0.1359002395637604, 0.07172057187893609, 0.04198004369041997] 846059 587350
1.9957 12.0 309144 0.0780 0.7340 0.7638 2.1182 [0.3562361599633563, 0.13051385463885645, 0.06873666863799165, 0.039808098889084334] 846059 646223
1.9816 13.0 334906 0.0748 0.6775 0.7198 2.1112 [0.3644272643077186, 0.1348813193666958, 0.07171355661769002, 0.04221956829440906] 846059 608972
1.9799 14.0 360668 0.0729 0.6567 0.7039 2.1094 [0.3683080239907046, 0.1360146909050323, 0.07189528256366162, 0.04211029597965069] 846059 595564
1.9721 15.0 386430 0.0724 0.6428 0.6935 2.1035 [0.37174066063670774, 0.13775257176864, 0.07323700636731323, 0.042981616643797974] 846059 586737
1.9415 16.0 412192 0.0707 0.6275 0.6822 2.1052 [0.37395952455210174, 0.1379846553581918, 0.07303615398474567, 0.04286080713028393] 846059 577140
1.921 17.0 437954 0.0755 0.6693 0.7135 2.1031 [0.368775375991227, 0.137205943045811, 0.07336018463397673, 0.04358297628398173] 846059 603671
1.9281 18.0 463716 0.0730 0.6426 0.6934 2.1008 [0.3719142378879256, 0.13830769094500395, 0.0738741230374939, 0.043850156367431746] 846059 586646
1.9619 19.0 489478 0.0741 0.6539 0.7019 2.1011 [0.3690046967173499, 0.1375453499071136, 0.07371470581812624, 0.044035281313872486] 846059 593819
1.9177 20.0 515240 2.1008 0.0745 [0.370227852188713, 0.13803247473556413, 0.07398987834019316, 0.04421999242711094] 0.6551 0.7028 594596 846059

Framework versions

  • Transformers 4.37.2
  • Pytorch 2.2.0+cu121
  • Datasets 2.17.0
  • Tokenizers 0.15.2
Downloads last month
2
Safetensors
Model size
223M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train sc20fg/pretrain_base_tokenizer

Evaluation results