byt5-small-finetuned-yiddish-experiment-9

This model is a fine-tuned version of google/byt5-small on the None dataset. It achieves the following results on the evaluation set:

Loss: 0.3473
Cer: 0.1505
Wer: 0.4678

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
gradient_accumulation_steps: 2
total_train_batch_size: 8
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 600
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Cer	Wer
10.7996	0.4728	100	10.9325	0.2905	0.7232
7.586	0.9456	200	10.5771	0.2698	0.6850
8.641	1.4161	300	10.0041	0.2570	0.6571
8.2901	1.8889	400	9.1435	0.2478	0.6396
8.076	2.3593	500	8.1677	0.2394	0.6277
7.8061	2.8322	600	7.0784	0.2317	0.6142
5.6829	3.3026	700	6.0549	0.2234	0.6094
5.343	3.7754	800	5.0819	0.2187	0.6038
4.8853	4.2459	900	4.2224	0.2157	0.6038
3.8875	4.7187	1000	3.5281	0.2123	0.5990
3.4853	5.1891	1100	2.8204	0.2095	0.5935
2.7984	5.6619	1200	2.2737	0.2039	0.5895
2.2336	6.1324	1300	1.7448	0.2016	0.5823
1.8465	6.6052	1400	1.2905	0.1959	0.5736
1.6188	7.0757	1500	1.1662	0.1945	0.5688
1.3051	7.5485	1600	1.1433	0.1939	0.5704
1.176	8.0189	1700	1.0655	0.1910	0.5672
1.0653	8.4917	1800	0.8529	0.1863	0.5561
0.8965	8.9645	1900	0.7841	0.1686	0.4972
0.7726	9.4350	2000	0.7415	0.1649	0.4956
0.7771	9.9078	2100	0.6933	0.1629	0.4885
0.7366	10.3783	2200	0.6601	0.1616	0.4861
0.6566	10.8511	2300	0.6124	0.1593	0.4853
0.6469	11.3215	2400	0.5665	0.1604	0.4829
0.6077	11.7943	2500	0.5210	0.1576	0.4805
0.5543	12.2648	2600	0.4658	0.1576	0.4781
0.5217	12.7376	2700	0.4372	0.1559	0.4781
0.5023	13.2080	2800	0.4111	0.1570	0.4805
0.4754	13.6809	2900	0.3967	0.1554	0.4741
0.4551	14.1513	3000	0.3880	0.1545	0.4726
0.4416	14.6241	3100	0.3800	0.1538	0.4741
0.4255	15.0946	3200	0.3752	0.1542	0.4749
0.4306	15.5674	3300	0.3724	0.1544	0.4741
0.4072	16.0378	3400	0.3663	0.1538	0.4741
0.4196	16.5106	3500	0.3606	0.1528	0.4726
0.3983	16.9835	3600	0.3635	0.1530	0.4694
0.3915	17.4539	3700	0.3605	0.1524	0.4694
0.4036	17.9267	3800	0.3563	0.1517	0.4686
0.3893	18.3972	3900	0.3558	0.1524	0.4686
0.3846	18.8700	4000	0.3562	0.1525	0.4678
0.3854	19.3404	4100	0.3530	0.1516	0.4670
0.3859	19.8132	4200	0.3523	0.1521	0.4678
0.3777	20.2837	4300	0.3516	0.1519	0.4670
0.3729	20.7565	4400	0.3502	0.1516	0.4678
0.3753	21.2270	4500	0.3497	0.1517	0.4678
0.3712	21.6998	4600	0.3502	0.1514	0.4686
0.3757	22.1702	4700	0.3487	0.1508	0.4678
0.3716	22.6430	4800	0.3488	0.1510	0.4678
0.369	23.1135	4900	0.3479	0.1507	0.4678
0.3808	23.5863	5000	0.3473	0.1505	0.4678
0.3696	24.0567	5100	0.3472	0.1511	0.4686
0.3718	24.5296	5200	0.3468	0.1508	0.4678
0.3651	25.0	5300	0.3466	0.1511	0.4686
0.3747	25.4728	5400	0.3467	0.1508	0.4686
0.3661	25.9456	5500	0.3468	0.1508	0.4686
0.3558	26.4161	5600	0.3472	0.1513	0.4686
0.3782	26.8889	5700	0.3469	0.1511	0.4686
0.3636	27.3593	5800	0.3467	0.1511	0.4686
0.3679	27.8322	5900	0.3466	0.1510	0.4678
0.3615	28.3026	6000	0.3465	0.1511	0.4678
0.3688	28.7754	6100	0.3466	0.1511	0.4678
0.3599	29.2459	6200	0.3466	0.1511	0.4678
0.3696	29.7187	6300	0.3465	0.1511	0.4678

Framework versions

Transformers 4.47.0
Pytorch 2.5.1+cu121
Datasets 2.14.4
Tokenizers 0.21.0

Addaci
/

byt5-small-finetuned-yiddish-experiment-9

byt5-small-finetuned-yiddish-experiment-9

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Addaci/byt5-small-finetuned-yiddish-experiment-9

Evaluation results