speller-t5-900

This model is a fine-tuned version of sberbank-ai/ruT5-base on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.1758
Rouge1: 19.3503
Rouge2: 8.3333
Rougel: 19.3503
Rougelsum: 19.3503
Gen Len: 41.4153

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 42
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
num_epochs: 1
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Validation Loss	Rouge1	Rouge2	Rougel	Rougelsum	Gen Len
1.0227	0.03	500	0.5411	17.6201	7.1186	17.6554	17.5847	45.5424
0.7224	0.07	1000	0.4269	18.1497	7.1186	18.1497	17.9732	42.7797
0.7101	0.1	1500	0.3542	18.9972	7.9661	18.9972	18.9619	42.3983
0.5962	0.14	2000	0.3283	18.9972	7.9661	18.9972	18.9619	42.2542
0.535	0.17	2500	0.3104	18.9972	7.9661	18.9972	18.9619	42.2627
0.6124	0.2	3000	0.2843	18.9972	7.9661	18.9972	18.9619	42.4915
0.491	0.24	3500	0.2706	18.9972	7.9661	18.9972	18.9619	42.4322
0.5028	0.27	4000	0.2647	19.5429	8.5876	19.5429	19.5621	42.3898
0.4547	0.31	4500	0.2548	18.9972	7.9661	18.9972	18.9619	42.178
0.4335	0.34	5000	0.2448	19.5429	8.5876	19.5429	19.5621	42.178
0.4511	0.38	5500	0.2377	19.4915	8.5876	19.4915	19.4915	42.3305
0.4765	0.41	6000	0.2337	19.5429	8.5876	19.5429	19.5621	41.4237
0.4355	0.44	6500	0.2233	19.4915	8.5876	19.4915	19.4915	41.7881
0.3924	0.48	7000	0.2172	19.4915	8.5876	19.4915	19.4915	40.9492
0.3898	0.51	7500	0.2153	19.4915	8.5876	19.4915	19.4915	41.6356
0.4236	0.55	8000	0.2102	19.4915	8.5876	19.4915	19.4915	41.0254
0.3484	0.58	8500	0.2116	19.4915	8.5876	19.4915	19.4915	41.8305
0.5514	0.61	9000	0.2017	19.6328	8.7571	19.5975	19.6328	41.1864
0.3298	0.65	9500	0.1945	19.6328	8.7571	19.5975	19.6328	41.2966
0.3807	0.68	10000	0.1966	19.6328	8.7571	19.5975	19.6328	41.6525
0.3177	0.72	10500	0.1918	19.3503	8.3333	19.3503	19.3503	41.2627
0.3374	0.75	11000	0.1903	19.6328	8.7571	19.5975	19.6328	41.2373
0.3123	0.78	11500	0.1900	19.6328	8.7571	19.5975	19.6328	41.2203
0.3377	0.82	12000	0.1847	19.6328	8.7571	19.5975	19.6328	41.2712
0.3138	0.85	12500	0.1814	19.6328	8.7571	19.5975	19.6328	41.1864
0.335	0.89	13000	0.1784	19.6328	8.7571	19.5975	19.6328	41.1695
0.3142	0.92	13500	0.1768	19.6328	8.7571	19.5975	19.6328	41.2542
0.3245	0.95	14000	0.1753	19.6328	8.7571	19.5975	19.6328	41.2034
0.3277	0.99	14500	0.1758	19.3503	8.3333	19.3503	19.3503	41.4153

Framework versions

Transformers 4.26.0
Pytorch 1.7.1+cu110
Datasets 2.9.0
Tokenizers 0.13.2

summervent
/

speller-t5-900

speller-t5-900

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results