mt5-small_final_final_new

This model is a fine-tuned version of google/mt5-small on the None dataset. It achieves the following results on the evaluation set:

Loss: 1.2941
Rouge1: 41.3841
Rouge2: 32.6198
Rougel: 38.6245
Rougelsum: 38.6833
Bleu: 28.8775
Gen Len: 17.0839
Meteor: 0.3704
No ans accuracy: 0.0
Av cosine sim: 0.7627

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1.5e-05
train_batch_size: 16
eval_batch_size: 16
seed: 9
gradient_accumulation_steps: 8
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Bleu	Gen Len	Meteor	Av cosine sim
14.5708	1.0	175	4.8623	10.2732	3.6837	9.295	9.3426	2.4037	8.7507	0.0865	0.4429
6.5938	1.99	350	3.0321	10.3823	5.1376	9.566	9.6003	3.8998	7.844	0.0969	0.4234
4.3372	2.99	525	2.3227	26.9602	18.9826	25.2396	25.2665	9.7754	12.2901	0.2376	0.6442
3.4266	3.98	700	2.0083	31.5678	23.6447	29.6748	29.7026	12.8064	13.222	0.2877	0.6947
3.0011	4.98	875	1.8600	32.2283	24.3874	30.2293	30.2518	14.2873	13.6664	0.2984	0.704
2.7444	5.97	1050	1.7535	32.4685	24.6833	30.4294	30.4397	14.9587	13.8386	0.3029	0.7074
2.5506	6.97	1225	1.6692	32.5693	24.8903	30.5541	30.5742	15.3203	13.9335	0.305	0.7097
2.4241	7.96	1400	1.5991	32.763	25.0389	30.7387	30.7372	15.8514	13.9643	0.3078	0.7127
2.2984	8.96	1575	1.5373	32.7553	25.113	30.7279	30.7385	16.1118	14.0551	0.3085	0.7126
2.2212	9.95	1750	1.4843	32.1917	24.619	30.2246	30.2458	16.1846	14.0741	0.3037	0.7068
2.1401	10.95	1925	1.4425	32.2614	24.7428	30.3223	30.3377	16.3919	13.9891	0.3044	0.7087
2.0755	11.94	2100	1.4034	32.222	24.6764	30.2975	30.3261	16.504	13.9859	0.3043	0.71
2.0328	12.94	2275	1.3723	32.1828	24.6096	30.2115	30.2389	16.5263	13.9632	0.3038	0.7099
1.9793	13.93	2450	1.3478	32.3184	24.6774	30.333	30.3495	16.8168	14.2392	0.3046	0.7097
1.9541	14.93	2625	1.3288	39.7212	31.117	37.1213	37.1596	26.1835	16.4908	0.3582	0.7527
1.9287	15.92	2800	1.3136	41.2942	32.5064	38.5652	38.6121	28.7564	17.0243	0.3693	0.7619
1.8985	16.92	2975	1.3059	41.3069	32.5558	38.5643	38.607	28.7815	17.0815	0.3697	0.7619
1.8938	17.91	3150	1.2985	41.4096	32.6579	38.6483	38.7074	28.8733	17.0759	0.3707	0.7628
1.8795	18.91	3325	1.2941	41.3841	32.6198	38.6245	38.6833	28.8775	17.0839	0.3704	0.7627

Framework versions

Transformers 4.31.0
Pytorch 2.0.1+cu118
Datasets 2.13.1
Tokenizers 0.13.3

psxjp5
/

mt5-small_new

mt5-small_final_final_new

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for psxjp5/mt5-small_new

Evaluation results