pretrain_custom_tokenizer

This model was trained from scratch on the code_search_net dataset. It achieves the following results on the evaluation set:

Loss: 2.9012
Bleu: 0.0437
Precisions: [0.17073810819731178, 0.05349823043007888, 0.026839997681805762, 0.014878806668698525]
Brevity Penalty: 1.0
Length Ratio: 1.8881
Translation Length: 1493697
Reference Length: 791127

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20

Training results

Training Loss	Epoch	Step	Bleu	Brevity Penalty	Length Ratio	Validation Loss	Precisions	Reference Length	Translation Length
3.87	1.0	25762	0.0330	1.0	1.9706	3.7713	[0.13936095014631156, 0.040556966110672256, 0.01980787640709075, 0.010544173702481034]	791127	1559002
3.6512	2.0	51524	0.0296	1.0	2.1794	3.5106	[0.12352037416285759, 0.03645373394602681, 0.01784718216329517, 0.009590691394861035]	791127	1724169
3.5043	3.0	77286	0.0366	1.0	2.0612	3.3681	[0.14769295867930776, 0.04478064762804751, 0.022321614312460995, 0.01211843395693241]	791127	1630660
3.3524	4.0	103048	0.0373	1.0	1.9870	3.2651	[0.15228345344701566, 0.04586568142634442, 0.02256759313264377, 0.012219488476642417]	791127	1571983
3.2746	5.0	128810	0.0384	1.0	2.0390	3.1935	[0.1523531796659091, 0.04687264288515885, 0.023561239014433664, 0.012935446265137207]	791127	1613094
3.2305	6.0	154572	0.0387	1.0	1.9700	3.1368	[0.1567301269740848, 0.047534152418592664, 0.023522792038785066, 0.012842802012275794]	791127	1558507
3.1199	7.0	180334	0.0406	1.0	1.9295	3.0924	[0.16104313669485146, 0.0497667381497795, 0.02487902888463687, 0.013686010776483513]	791127	1526473
3.1476	8.0	206096	0.0416	1.0	1.9408	3.0537	[0.16303145796074886, 0.050591823896046224, 0.025582405968043283, 0.014183117767188563]	791127	1535446
3.031	9.0	231858	0.0424	1.0	1.8818	3.0262	[0.16684712738332408, 0.051844235468668176, 0.026003093150347323, 0.01442481092789985]	791127	1488782
3.0243	10.0	257620	0.0420	1.0	1.8859	3.0003	[0.16607697592198523, 0.05141761221771236, 0.025742386869365745, 0.014193381846444359]	791127	1492025
3.0343	11.0	283382	0.0428	1.0	1.8886	2.9777	[0.1691752170217323, 0.052193166007217705, 0.026141681013552288, 0.014484642594473435]	791127	1494090
2.9652	12.0	309144	0.0428	1.0	1.9005	2.9615	[0.16823973933395542, 0.052320879613441, 0.026187658344737696, 0.014502132420939685]	791127	1503533
2.9981	13.0	334906	0.0437	1.0	1.8706	2.9445	[0.16985697826461554, 0.05332124116669285, 0.02686760130903802, 0.014972828451699309]	791127	1479845
2.941	14.0	360668	0.0432	1.0	1.8655	2.9335	[0.17029332390165305, 0.052870421070299455, 0.02642667143527729, 0.014670292675070435]	791127	1475877
2.8816	15.0	386430	0.0437	1.0	1.8631	2.9228	[0.1712148556231003, 0.053400515923720436, 0.026818846008300055, 0.014919424168060396]	791127	1473920
2.9124	16.0	412192	0.0435	1.0	1.8775	2.9150	[0.17018986135899558, 0.05329713851829526, 0.02675404721408581, 0.014813441829455485]	791127	1485347
2.9019	17.0	437954	0.0433	1.0	1.8899	2.9091	[0.17013412808717324, 0.053090832561091934, 0.026579011940635933, 0.014667138889511053]	791127	1495138
2.8737	18.0	463716	0.0438	1.0	1.8892	2.9044	[0.17056889528815428, 0.05354494235512719, 0.026930478436931807, 0.014909059846295177]	791127	1494616
2.9192	19.0	489478	0.0439	1.0	1.8837	2.9019	[0.1710080779910644, 0.05364458098047096, 0.02695562446920049, 0.014976875182171591]	791127	1490222
2.8501	20.0	515240	2.9012	0.0437	[0.17073810819731178, 0.05349823043007888, 0.026839997681805762, 0.014878806668698525]	1.0	1.8881	1493697	791127

Framework versions

Transformers 4.37.2
Pytorch 2.2.0+cu121
Datasets 2.17.0
Tokenizers 0.15.2

sc20fg
/

pretrain_custom_tokenizer

pretrain_custom_tokenizer

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Dataset used to train sc20fg/pretrain_custom_tokenizer

Evaluation results