iwslt_aligned_smallT5_cont0

This model is a fine-tuned version of google/mt5-small on the paulh27/alignment_iwslt2017_de_en dataset. It achieves the following results on the evaluation set:

Loss: 0.5612
Bleu: 65.6358
Gen Len: 28.7691

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0002
train_batch_size: 8
eval_batch_size: 8
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 16
optimizer: Adafactor
lr_scheduler_type: constant
training_steps: 500000

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
1.2426	0.78	10000	0.8300	46.2793	28.6532
0.9931	1.55	20000	0.6756	52.2709	28.6441
0.8573	2.33	30000	0.6143	55.8294	28.5405
0.762	3.11	40000	0.5811	57.5135	28.366
0.734	3.88	50000	0.5499	58.6125	28.5101
0.6722	4.66	60000	0.5228	59.6427	28.8356
0.6215	5.43	70000	0.5161	60.4701	28.7534
0.5756	6.21	80000	0.5068	62.0864	28.6498
0.5738	6.99	90000	0.5005	61.9714	28.5788
0.5384	7.76	100000	0.4909	62.407	28.5282
0.5109	8.54	110000	0.4902	62.1452	28.4617
0.4816	9.32	120000	0.4875	62.6499	28.5518
0.4493	10.09	130000	0.4867	62.6694	28.6993
0.4648	10.87	140000	0.4775	63.3179	28.5495
0.4414	11.64	150000	0.4787	63.6928	28.4673
0.4158	12.42	160000	0.4792	63.8752	28.5011
0.3895	13.2	170000	0.4794	63.8429	28.6498
0.4031	13.97	180000	0.4757	63.9496	28.7264
0.3844	14.75	190000	0.4855	63.7498	28.8288
0.3637	15.53	200000	0.4800	64.2277	28.661
0.3473	16.3	210000	0.4854	64.4683	28.786
0.3243	17.08	220000	0.4903	64.7805	28.6791
0.3426	17.85	230000	0.4819	64.679	28.4809
0.3295	18.63	240000	0.4852	65.3735	28.6014
0.3124	19.41	250000	0.4947	64.5641	28.6745
0.2933	20.18	260000	0.4972	65.1364	28.6419
0.3101	20.96	270000	0.4902	64.6747	28.6802
0.2991	21.74	280000	0.4907	64.9732	28.5653
0.2828	22.51	290000	0.5038	64.7552	28.6261
0.2688	23.29	300000	0.5042	65.0702	28.7534
0.2555	24.06	310000	0.5101	65.0378	29.089
0.2692	24.84	320000	0.5022	64.9991	28.6937
0.2593	25.62	330000	0.5085	65.2478	28.6137
0.2439	26.39	340000	0.5152	64.863	28.6464
0.2327	27.17	350000	0.5165	65.0748	28.7286
0.249	27.95	360000	0.5116	64.7249	28.6137
0.238	28.72	370000	0.5202	64.7651	28.5968
0.2297	29.5	380000	0.5243	65.3334	28.7005
0.2152	30.27	390000	0.5336	64.9364	28.6081
0.2106	31.05	400000	0.5408	65.117	28.6745
0.2234	31.83	410000	0.5249	64.8926	28.6318
0.2085	32.6	420000	0.5306	65.5715	28.7984
0.2018	33.38	430000	0.5429	64.9154	28.6351
0.1885	34.16	440000	0.5453	65.0538	28.8525
0.2049	34.93	450000	0.5434	65.2857	28.7207
0.1957	35.71	460000	0.5491	65.3436	28.714
0.1867	36.49	470000	0.5536	65.4934	28.7939
0.1765	37.26	480000	0.5583	65.5595	28.8255
0.1786	38.04	490000	0.5612	65.6358	28.7691
0.1809	38.81	500000	0.5573	65.0266	28.7455

Framework versions

Transformers 4.39.3
Pytorch 2.2.2+cu121
Datasets 2.18.0
Tokenizers 0.15.2

paulh27
/

iwslt_aligned_smallT5_cont0

iwslt_aligned_smallT5_cont0

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for paulh27/iwslt_aligned_smallT5_cont0

Dataset used to train paulh27/iwslt_aligned_smallT5_cont0

Evaluation results