cs_mT5-large2_2e-5_100_v0.4

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 3.1297
Bleu: 12.9589
Gen Len: 15.7619

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
18.9475	1.0	6	8.4357	8.3431	19.0
11.3295	2.0	12	7.0692	8.4786	19.0
11.2185	3.0	18	6.4881	7.8425	19.0
9.7688	4.0	24	6.2043	7.4958	19.0
7.633	5.0	30	6.1694	7.4994	19.0
10.8618	6.0	36	6.0789	7.2123	19.0
9.4099	7.0	42	6.0121	7.4767	19.0
7.1028	8.0	48	5.9718	7.4839	19.0
10.2013	9.0	54	5.9141	8.246	19.0
11.8248	10.0	60	5.8562	8.4493	19.0
6.4776	11.0	66	5.7982	8.3904	19.0
6.813	12.0	72	5.7229	8.6768	19.0
10.7703	13.0	78	5.6543	8.6985	19.0
7.1642	14.0	84	5.6137	8.8415	19.0
8.1195	15.0	90	5.5589	8.9574	19.0
10.4234	16.0	96	5.4890	8.8876	19.0
6.9893	17.0	102	5.4146	10.5467	19.0
7.3889	18.0	108	5.3484	10.5658	19.0
6.042	19.0	114	5.3008	10.4592	19.0
8.1065	20.0	120	5.2775	9.958	19.0
5.2708	21.0	126	5.2247	9.4528	19.0
4.9285	22.0	132	5.1740	7.2801	19.0
4.8751	23.0	138	5.1216	7.2902	19.0
5.6123	24.0	144	5.0377	9.9147	19.0
4.6797	25.0	150	4.9345	10.7926	19.0
4.3882	26.0	156	4.8420	10.7388	19.0
5.2828	27.0	162	4.7564	10.6526	19.0
4.2994	28.0	168	4.6960	9.7076	19.0
3.964	29.0	174	4.5760	9.6231	19.0
9.4351	30.0	180	4.5173	8.9601	19.0
5.1911	31.0	186	4.4738	9.8122	19.0
3.4724	32.0	192	4.4102	10.0874	19.0
4.39	33.0	198	4.3444	10.9115	19.0
6.3546	34.0	204	4.2755	11.0007	18.7619
5.4393	35.0	210	4.2020	11.127	18.7619
2.8915	36.0	216	4.1248	11.0087	18.7619
6.6742	37.0	222	4.0757	11.4869	18.9048
3.5017	38.0	228	4.0376	11.4063	18.9048
4.1386	39.0	234	3.9890	10.7683	18.8095
4.9058	40.0	240	3.9372	10.7683	18.8095
4.2836	41.0	246	3.8872	10.8023	18.8095
3.7174	42.0	252	3.8378	10.9883	18.4286
3.0365	43.0	258	3.7907	11.0504	18.381
3.3476	44.0	264	3.7541	11.0634	17.9524
3.9578	45.0	270	3.7206	11.4798	17.4762
4.3193	46.0	276	3.6877	11.5753	17.3333
3.6244	47.0	282	3.6538	11.8793	16.5714
2.9136	48.0	288	3.6136	12.0169	15.9524
2.2932	49.0	294	3.5735	11.126	16.3333
3.4335	50.0	300	3.5422	11.3689	16.3333
2.9941	51.0	306	3.5072	11.0037	16.3333
4.7679	52.0	312	3.4671	10.8819	16.3333
2.7498	53.0	318	3.4404	10.8882	16.619
3.5759	54.0	324	3.4195	10.8882	16.619
2.9012	55.0	330	3.4040	10.5862	17.3333
3.2823	56.0	336	3.3924	10.57	17.0476
4.5793	57.0	342	3.3806	11.605	17.0
4.271	58.0	348	3.3672	11.7166	16.8095
3.5227	59.0	354	3.3569	11.8219	16.5238
2.9193	60.0	360	3.3481	11.8931	16.381
3.5956	61.0	366	3.3385	11.1876	16.5238
3.5521	62.0	372	3.3293	11.0871	16.7143
2.6291	63.0	378	3.3136	11.2216	16.5714
2.0321	64.0	384	3.2935	11.29	16.381
2.5651	65.0	390	3.2846	11.3853	16.5714
2.9702	66.0	396	3.2866	11.3853	16.5714
2.2628	67.0	402	3.2755	11.294	16.3333
2.5516	68.0	408	3.2619	11.699	16.1429
3.3097	69.0	414	3.2485	11.682	16.1429
1.8752	70.0	420	3.2383	11.8141	15.9048
2.3432	71.0	426	3.2299	11.8141	15.9048
2.2128	72.0	432	3.2202	12.0422	15.381
2.7711	73.0	438	3.2107	11.9983	15.5238
3.7951	74.0	444	3.2039	12.2396	15.6667
2.7207	75.0	450	3.1969	12.9329	15.619
2.071	76.0	456	3.1905	12.4005	15.619
1.9696	77.0	462	3.1861	12.5352	15.2857
1.2979	78.0	468	3.1816	12.5352	15.2857
2.6149	79.0	474	3.1777	12.5352	15.2857
1.7925	80.0	480	3.1720	12.5352	15.2857
2.3365	81.0	486	3.1683	12.4005	15.619
3.0536	82.0	492	3.1653	12.4005	15.619
2.6278	83.0	498	3.1617	12.4005	15.619
3.2318	84.0	504	3.1583	12.4005	15.619
2.9789	85.0	510	3.1569	12.4005	15.619
2.3504	86.0	516	3.1537	12.4005	15.619
1.603	87.0	522	3.1508	12.4005	15.619
3.2194	88.0	528	3.1486	12.2448	16.0952
2.6168	89.0	534	3.1459	12.2448	16.0952
2.3382	90.0	540	3.1429	12.2448	16.0952
3.6469	91.0	546	3.1397	12.8612	15.9048
2.2697	92.0	552	3.1371	12.8612	15.9048
1.8352	93.0	558	3.1356	12.8612	15.9048
1.3854	94.0	564	3.1344	12.8612	15.9048
2.6405	95.0	570	3.1336	12.8612	15.9048
2.0361	96.0	576	3.1321	12.8612	15.9048
3.4828	97.0	582	3.1311	12.9589	15.7619
2.6929	98.0	588	3.1304	12.9589	15.7619
2.2882	99.0	594	3.1299	12.9589	15.7619
2.4893	100.0	600	3.1297	12.9589	15.7619

Framework versions

Transformers 4.38.2
Pytorch 2.1.0+cu121
Datasets 2.18.0
Tokenizers 0.15.2

kmok1
/

cs_mT5-large2_2e-5_100_v0.4

cs_mT5-large2_2e-5_100_v0.4

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for kmok1/cs_mT5-large2_2e-5_100_v0.4

Evaluation results