t5-small_6_3-en-hi_en_bt

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.9293
Bleu: 8.9676
Gen Len: 33.391

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
3.7929	1.0	526	2.6759	1.5672	16.749
3.1151	2.0	1052	2.3843	2.2962	16.5287
2.8701	3.0	1578	2.2287	2.8811	16.4953
2.7121	4.0	2104	2.1302	3.3949	16.5247
2.5844	5.0	2630	2.0593	3.8161	16.4513
2.4917	6.0	3156	2.0063	3.9831	16.4272
2.4067	7.0	3682	1.9733	4.0511	16.3378
2.3395	8.0	4208	1.9399	4.3067	16.4112
2.2713	9.0	4734	1.9148	4.3195	16.3618
2.2217	10.0	5260	1.8961	4.3905	16.4112
2.1659	11.0	5786	1.8787	4.4548	16.3298
2.1267	12.0	6312	1.8651	4.5779	16.3618
2.0793	13.0	6838	1.8540	4.4863	16.2603
2.0473	14.0	7364	1.8444	4.556	16.3044
2.0082	15.0	7890	1.8353	4.5957	16.3124
1.9748	16.0	8416	1.8313	4.5593	16.3204
1.9456	17.0	8942	1.8259	4.4522	16.2764
1.9177	18.0	9468	1.8231	4.3345	16.3084
1.8871	19.0	9994	1.8177	4.48	16.3458
1.8422	20.0	10520	1.8123	4.5078	16.287
1.8161	21.0	11046	1.8106	4.3289	16.3405
1.7972	22.0	11572	1.8106	4.5204	16.3244
1.7785	23.0	12098	1.8117	4.4651	16.3605
1.7563	24.0	12624	1.8125	4.3938	16.3538
1.7444	25.0	13150	1.8089	4.5367	16.3792
1.7256	26.0	13676	1.8075	4.4212	16.3925
1.7021	27.0	14202	1.8080	4.5491	16.3992
1.6969	28.0	14728	1.8061	4.6568	16.3645
1.6766	29.0	15254	1.8063	4.6297	16.3738
1.6653	30.0	15780	1.8095	4.6167	16.2977
1.6543	31.0	16306	1.8085	4.5452	16.3538
1.6413	32.0	16832	1.8112	4.6667	16.3351
1.6293	33.0	17358	1.8126	4.6127	16.3351
1.6204	34.0	17884	1.8115	4.7196	16.3111
1.6082	35.0	18410	1.8134	4.7011	16.3324
1.6048	36.0	18936	1.8122	4.6429	16.2964
1.5911	37.0	19462	1.8143	4.6424	16.3124
1.5834	38.0	19988	1.8131	4.6254	16.3164
1.5742	39.0	20514	1.8154	4.6998	16.287
1.5623	40.0	21040	1.8147	4.6469	16.3471
1.5599	41.0	21566	1.8185	4.6654	16.3231
1.5516	42.0	22092	1.8173	4.6961	16.3471
1.5441	43.0	22618	1.8180	4.7176	16.3084
1.545	44.0	23144	1.8177	4.5571	16.275
1.5418	45.0	23670	1.8195	4.5927	16.3097
1.5329	46.0	24196	1.8187	4.7025	16.2724
1.5348	47.0	24722	1.8198	4.6575	16.3057
1.5362	48.0	25248	1.8197	4.6912	16.2991
1.5231	49.0	25774	1.8202	4.6752	16.2951
1.5314	50.0	26300	1.8208	4.6114	16.2937

Framework versions

Transformers 4.20.0.dev0
Pytorch 1.8.0
Datasets 2.1.0
Tokenizers 0.12.1

sayanmandal
/

t5-small_6_3-en-hi_en_bt

t5-small_6_3-en-hi_en_bt

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results