cs_mT5-large_0.01_100_v0.1

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 6.2112
Bleu: 0.8171
Gen Len: 19.0

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.01
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 100

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
9.5426	1.0	6	13.2737	0.0	19.0
9.5598	2.0	12	57.9184	0.2088	19.0
8.9859	3.0	18	7.1357	0.2088	19.0
4.8419	4.0	24	6.5896	0.0	2.0
6.0099	5.0	30	5.9797	0.0	19.0
5.4071	6.0	36	6.0228	0.0	19.0
5.4716	7.0	42	6.0132	0.0	19.0
5.6419	8.0	48	5.9242	0.0	19.0
5.7044	9.0	54	6.0529	0.0	19.0
5.7007	10.0	60	5.8730	0.0	19.0
5.0052	11.0	66	6.0392	0.0	19.0
6.3889	12.0	72	6.0776	0.0	19.0
5.2703	13.0	78	70.6639	0.0	19.0
7.1444	14.0	84	7.6067	0.0	19.0
4.7785	15.0	90	6.5610	0.0	19.0
5.6738	16.0	96	6.0522	0.0	19.0
5.5087	17.0	102	6.0558	0.0	19.0
5.4367	18.0	108	5.9737	0.0	19.0
5.5081	19.0	114	6.0431	0.0	19.0
5.2506	20.0	120	5.9623	0.0	19.0
5.354	21.0	126	6.0081	0.0	19.0
5.5891	22.0	132	5.9859	0.0	19.0
5.2457	23.0	138	5.9296	0.0	19.0
4.9566	24.0	144	6.0038	0.0	19.0
5.3327	25.0	150	6.0421	0.0	19.0
4.946	26.0	156	6.0225	0.0	19.0
5.1903	27.0	162	5.9587	0.0	19.0
5.0797	28.0	168	5.9780	0.0	19.0
4.8033	29.0	174	6.0577	0.0	19.0
5.559	30.0	180	6.0250	0.0	19.0
5.7859	31.0	186	5.9493	0.0	19.0
5.4172	32.0	192	6.0647	0.0	19.0
4.9906	33.0	198	6.0617	0.0	19.0
4.9745	34.0	204	5.9800	0.0	19.0
5.2086	35.0	210	5.9942	0.0	19.0
5.7047	36.0	216	5.9996	0.0	19.0
4.4275	37.0	222	6.0826	0.0	19.0
4.9545	38.0	228	6.0865	0.0	19.0
5.1466	39.0	234	5.9571	0.0	19.0
5.5095	40.0	240	5.9970	0.0	19.0
5.1998	41.0	246	5.9978	0.0	19.0
4.8406	42.0	252	6.0314	0.0	19.0
5.0467	43.0	258	6.0444	0.0	19.0
5.2282	44.0	264	6.0295	0.0	19.0
4.8847	45.0	270	6.0284	0.0	19.0
5.5734	46.0	276	6.0598	0.0	19.0
4.743	47.0	282	6.0396	0.0	19.0
5.3795	48.0	288	6.0567	0.0	19.0
4.9066	49.0	294	6.0615	0.0	19.0
4.9682	50.0	300	6.1018	0.0	19.0
4.828	51.0	306	6.0605	0.0	19.0
4.5153	52.0	312	6.0531	0.0	19.0
5.2316	53.0	318	5.9855	0.0	19.0
4.8071	54.0	324	6.0292	0.0	19.0
5.106	55.0	330	6.0541	0.0	19.0
4.9581	56.0	336	5.9499	0.0	19.0
4.8037	57.0	342	6.1083	0.0	19.0
4.7738	58.0	348	6.0111	0.0	19.0
5.3786	59.0	354	6.0164	0.0	19.0
4.8782	60.0	360	5.9442	0.0	19.0
4.8589	61.0	366	5.9036	0.8171	19.0
4.8486	62.0	372	5.7896	0.8171	19.0
4.4303	63.0	378	5.8475	0.8171	19.0
5.116	64.0	384	5.7361	0.8171	19.0
4.9206	65.0	390	5.7211	0.8171	19.0
4.5294	66.0	396	5.6845	0.8171	19.0
5.0969	67.0	402	5.6964	0.8171	19.0
4.4403	68.0	408	5.7035	0.8171	19.0
4.3498	69.0	414	5.7088	0.8171	19.0
5.0456	70.0	420	5.6742	0.8171	19.0
4.9812	71.0	426	5.6820	0.8171	19.0
4.4053	72.0	432	5.7010	0.8171	19.0
4.8459	73.0	438	5.8511	0.8171	19.0
4.3272	74.0	444	5.7204	0.8171	19.0
4.4791	75.0	450	5.7542	0.8171	19.0
4.5272	76.0	456	5.7444	0.8171	19.0
4.2581	77.0	462	5.7456	0.879	19.0
4.718	78.0	468	5.7187	0.8171	19.0
4.3661	79.0	474	5.8472	0.8291	19.0
4.8016	80.0	480	5.7478	0.8171	19.0
4.1973	81.0	486	5.8850	0.8171	19.0
4.0916	82.0	492	5.7678	0.8171	19.0
4.1624	83.0	498	5.8662	0.8171	19.0
4.2458	84.0	504	5.9224	0.8171	19.0
3.7141	85.0	510	5.8928	0.8171	19.0
3.5796	86.0	516	6.0489	0.937	19.0
4.8417	87.0	522	6.1602	0.8171	19.0
4.3568	88.0	528	5.9343	0.8171	19.0
4.6028	89.0	534	5.9039	0.8171	19.0
3.6638	90.0	540	6.1188	0.879	19.0
4.1465	91.0	546	6.0166	0.8171	19.0
4.32	92.0	552	6.0690	0.8171	19.0
4.0945	93.0	558	6.0812	0.8171	19.0
3.9572	94.0	564	5.9877	0.8171	19.0
3.9032	95.0	570	6.0960	0.2223	19.0
4.3571	96.0	576	6.1585	0.8171	19.0
3.768	97.0	582	6.1953	0.8171	19.0
3.94	98.0	588	6.2025	0.8171	19.0
3.8452	99.0	594	6.2129	0.8171	19.0
4.4174	100.0	600	6.2112	0.8171	19.0

Framework versions

Transformers 4.35.2
Pytorch 1.13.1+cu117
Datasets 2.17.0
Tokenizers 0.15.2

kmok1
/

cs_mT5-large_0.01_100_v0.1

cs_mT5-large_0.01_100_v0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from

Evaluation results

cs_mT5-large_0.01_100_v0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Finetuned from google/mt5-large

Evaluation results

Finetuned from