cs_mT5-large2_2e-5_50_v0.1

This model is a fine-tuned version of google/mt5-large on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 4.5108
Bleu: 19.8919
Gen Len: 17.7619

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 16
eval_batch_size: 16
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 50

Training results

Training Loss	Epoch	Step	Validation Loss	Bleu	Gen Len
16.9199	1.0	6	10.5138	9.6354	19.0
9.9396	2.0	12	8.3590	8.988	19.0
19.1783	3.0	18	7.4137	8.7723	19.0
9.8097	4.0	24	7.3182	8.8796	19.0
16.8467	5.0	30	7.2232	8.6892	19.0
9.745	6.0	36	6.9902	7.822	19.0
6.2948	7.0	42	6.8174	8.2013	19.0
6.3194	8.0	48	6.7064	7.6678	19.0
6.927	9.0	54	6.6122	9.9162	19.0
7.198	10.0	60	6.5138	13.3863	19.0
7.6505	11.0	66	6.4263	12.4078	19.0
7.9063	12.0	72	6.3326	13.0376	19.0
9.021	13.0	78	6.2376	13.6209	19.0
9.2462	14.0	84	6.1222	13.3871	19.0
7.7924	15.0	90	5.9968	14.1604	19.0
5.1947	16.0	96	5.8706	11.7859	19.0
9.9564	17.0	102	5.7396	13.4904	19.0
5.2706	18.0	108	5.6295	13.5218	19.0
6.6567	19.0	114	5.5203	14.0857	19.0
5.0918	20.0	120	5.3965	15.3213	19.0
6.2442	21.0	126	5.2742	15.6508	19.0
4.5073	22.0	132	5.1884	15.8637	19.0
3.3254	23.0	138	5.1282	14.7385	19.0
6.9905	24.0	144	5.0841	15.5385	19.0
6.3553	25.0	150	5.0408	16.9058	19.0
4.8396	26.0	156	5.0165	16.3831	19.0
4.7646	27.0	162	4.9914	16.2156	19.0
3.6864	28.0	168	4.9643	16.4319	19.0
4.7526	29.0	174	4.9186	17.5044	19.0
4.5518	30.0	180	4.8727	16.7818	19.0
3.9017	31.0	186	4.8264	16.9433	19.0
4.6864	32.0	192	4.7818	16.8868	19.0
3.0676	33.0	198	4.7505	18.2291	19.0
5.9861	34.0	204	4.7214	18.3309	19.0
5.0304	35.0	210	4.7003	18.3309	19.0
3.9478	36.0	216	4.6791	18.1004	19.0
4.9706	37.0	222	4.6651	17.787	19.0
5.0404	38.0	228	4.6401	17.787	19.0
4.938	39.0	234	4.6045	18.6261	17.7619
5.7176	40.0	240	4.5833	17.1931	17.7619
3.3352	41.0	246	4.5654	17.1931	17.7619
4.8397	42.0	252	4.5517	17.6767	17.7619
4.401	43.0	258	4.5441	17.1931	17.7619
5.4609	44.0	264	4.5370	17.5969	17.7619
4.9223	45.0	270	4.5295	19.1503	17.7619
4.092	46.0	276	4.5215	19.1133	17.7619
3.3364	47.0	282	4.5159	19.1133	17.7619
4.9208	48.0	288	4.5131	19.8919	17.7619
3.5934	49.0	294	4.5115	19.8919	17.7619
4.5551	50.0	300	4.5108	19.8919	17.7619

Framework versions

Transformers 4.38.2
Pytorch 2.1.0+cu121
Datasets 2.18.0
Tokenizers 0.15.2

kmok1
/

cs_mT5-large2_2e-5_50_v0.1

cs_mT5-large2_2e-5_50_v0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for kmok1/cs_mT5-large2_2e-5_50_v0.1

Evaluation results