speller-t5-9001

This model is a fine-tuned version of sberbank-ai/ruT5-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.1587
Rouge1: 17.0762
Rouge2: 5.6336
Rougel: 17.1181
Rougelsum: 17.2316
Gen Len: 40.2034

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.172	0.03	500	0.5659	14.1669	4.1265	13.7878	14.1044	42.7458
0.7063	0.07	1000	0.4207	14.5638	4.8305	14.4688	14.5907	43.8898
0.6604	0.1	1500	0.3557	16.2672	4.8685	16.2308	16.3516	43.8644
0.5429	0.14	2000	0.3266	16.6436	5.1161	16.6667	16.6872	43.4831
0.5245	0.17	2500	0.2964	16.6667	5.1963	16.6775	16.7707	42.3983
0.5812	0.2	3000	0.2757	16.6969	5.339	16.7331	16.8449	41.3051
0.5019	0.24	3500	0.2626	16.686	5.4462	16.6815	16.8733	40.7203
0.4182	0.27	4000	0.2531	16.7142	5.5085	16.6667	16.9373	40.6102
0.4592	0.31	4500	0.2413	16.947	5.5404	16.9581	17.059	40.1441
0.4626	0.34	5000	0.2299	16.9492	5.6063	16.944	17.0235	40.3475
0.4158	0.38	5500	0.2228	16.8653	5.5608	16.9429	17.0407	39.5085
0.4261	0.41	6000	0.2185	16.9293	5.5843	16.9492	17.0365	39.8814
0.4465	0.44	6500	0.2088	16.9492	5.5968	16.9895	17.1106	39.4746
0.3919	0.48	7000	0.2015	16.9492	5.5843	16.9839	17.0937	39.6864
0.3994	0.51	7500	0.2023	17.0836	5.6632	17.0588	17.1895	40.5932
0.466	0.55	8000	0.1968	17.1664	5.7257	17.1664	17.3019	40.4153
0.419	0.58	8500	0.1899	17.0132	5.6021	17.0625	17.1945	39.4831
0.4047	0.61	9000	0.1877	17.0418	5.6217	16.9895	17.1106	39.9237
0.3728	0.65	9500	0.1798	16.9856	5.5876	16.9947	17.1612	39.4237
0.3685	0.68	10000	0.1768	16.9856	5.6249	16.9492	17.1339	39.2966
0.4241	0.72	10500	0.1739	16.9908	5.595	17.0532	17.1845	39.3814
0.3006	0.75	11000	0.1740	16.9492	5.5799	16.9802	17.1525	39.5085
0.339	0.78	11500	0.1739	17.0495	5.6497	17.047	17.1796	39.8136
0.3387	0.82	12000	0.1711	16.9908	5.595	17.0532	17.1845	39.4746
0.3116	0.85	12500	0.1642	16.9492	5.5799	16.9802	17.1525	39.161
0.3112	0.89	13000	0.1620	17.0021	5.6076	17.0374	17.1719	39.178
0.341	0.92	13500	0.1638	17.1664	5.7473	17.2279	17.3384	40.1864
0.2885	0.95	14000	0.1609	17.1664	5.7931	17.2504	17.3565	40.1356
0.3335	0.99	14500	0.1587	17.0762	5.6336	17.1181	17.2316	40.2034

Framework versions

Transformers 4.26.0
Pytorch 1.7.1+cu110
Datasets 2.9.0
Tokenizers 0.13.2

summervent
/

speller-t5-9001

speller-t5-9001

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results