t5-small_6_3-en-hi_en_LinCE

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Loss: 2.2034
Bleu: 7.8135
Gen Len: 39.5564

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
No log	0.99	94	3.5424	0.9187	16.7437
No log	1.99	188	3.1434	1.2886	16.8158
No log	2.99	282	2.9494	1.4577	16.7824
No log	3.99	376	2.8233	1.4745	16.8879
No log	4.99	470	2.7300	1.7116	16.6636
3.6303	5.99	564	2.6589	1.7857	16.6302
3.6303	6.99	658	2.6005	1.8572	16.4553
3.6303	7.99	752	2.5456	2.139	16.3925
3.6303	8.99	846	2.5023	2.3835	16.2911
3.6303	9.99	940	2.4725	2.5607	16.3271
2.9087	10.99	1034	2.4272	2.6614	16.3138
2.9087	11.99	1128	2.3977	2.9623	16.3338
2.9087	12.99	1222	2.3686	3.1248	16.2443
2.9087	13.99	1316	2.3438	3.3294	16.3458
2.9087	14.99	1410	2.3253	3.3885	16.3591
2.6588	15.99	1504	2.3028	3.3985	16.3124
2.6588	16.99	1598	2.2839	3.3772	16.3858
2.6588	17.99	1692	2.2704	3.5804	16.3872
2.6588	18.99	1786	2.2533	3.8751	16.2697
2.6588	19.99	1880	2.2378	4.0003	16.3271
2.6588	20.99	1974	2.2233	4.0271	16.3031
2.5079	21.99	2068	2.2160	4.1898	16.3057
2.5079	22.99	2162	2.2010	4.1216	16.3031
2.5079	23.99	2256	2.1935	4.1311	16.2644
2.5079	24.99	2350	2.1833	4.1373	16.3138
2.5079	25.99	2444	2.1725	4.3471	16.3057
2.4027	26.99	2538	2.1657	4.183	16.3298
2.4027	27.99	2632	2.1611	4.2867	16.3351
2.4027	28.99	2726	2.1531	4.2689	16.2737
2.4027	29.99	2820	2.1482	4.4802	16.2644
2.4027	30.99	2914	2.1443	4.469	16.231
2.3251	31.99	3008	2.1375	4.5295	16.227
2.3251	32.99	3102	2.1330	4.4799	16.2243
2.3251	33.99	3196	2.1307	4.7124	16.2417
2.3251	34.99	3290	2.1248	4.5954	16.3004
2.3251	35.99	3384	2.1215	4.7455	16.215
2.3251	36.99	3478	2.1166	4.6233	16.2016
2.2818	37.99	3572	2.1147	4.6843	16.219
2.2818	38.99	3666	2.1112	4.7068	16.2163
2.2818	39.99	3760	2.1071	4.684	16.223
2.2818	40.99	3854	2.1034	4.7323	16.2523
2.2818	41.99	3948	2.0998	4.6406	16.2016
2.2392	42.99	4042	2.1017	4.7609	16.1976
2.2392	43.99	4136	2.1021	4.7634	16.2069
2.2392	44.99	4230	2.0994	4.7854	16.1976
2.2392	45.99	4324	2.0980	4.7562	16.2243
2.2392	46.99	4418	2.0964	4.7921	16.219
2.2192	47.99	4512	2.0970	4.8029	16.2377
2.2192	48.99	4606	2.0967	4.7953	16.2176
2.2192	49.99	4700	2.0968	4.819	16.2457

Framework versions

Transformers 4.20.0.dev0
Pytorch 1.8.0
Datasets 2.1.0
Tokenizers 0.12.1

sayanmandal
/

t5-small_6_3-en-hi_en_LinCE

t5-small_6_3-en-hi_en_LinCE

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results