hubert-large-ll60k-librispeech-clean-100h-demo-dist

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Cer: 0.0316
Loss: 0.2143
Wer: 0.0995

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.0003
train_batch_size: 32
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 8
total_train_batch_size: 256
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
num_epochs: 50.0
mixed_precision_training: Native AMP

Training results

Training Loss	Epoch	Step	Cer	Validation Loss	Wer
2.911	0.89	100	1.0	2.9202	1.0
2.6638	1.79	200	1.0	2.6310	1.0
0.3898	2.68	300	0.0968	0.3892	0.3366
0.2156	3.57	400	0.0591	0.2250	0.2090
0.1517	4.46	500	0.0474	0.1834	0.1695
0.1059	5.36	600	0.0428	0.1668	0.1502
0.0825	6.25	700	0.0393	0.1662	0.1406
0.0679	7.14	800	0.0393	0.1747	0.1357
0.0602	8.04	900	0.0390	0.1767	0.1334
0.0587	8.93	1000	0.0376	0.1708	0.1292
0.0517	9.82	1100	0.0372	0.1677	0.1255
0.0413	10.71	1200	0.0361	0.1771	0.1234
0.0418	11.61	1300	0.0358	0.1731	0.1229
0.0424	12.5	1400	0.0348	0.1796	0.1191
0.0469	13.39	1500	0.0358	0.1848	0.1207
0.0414	14.29	1600	0.0367	0.1863	0.1213
0.0338	15.18	1700	0.0347	0.1889	0.1177
0.0334	16.07	1800	0.0360	0.1900	0.1188
0.0315	16.96	1900	0.0346	0.1901	0.1158
0.0317	17.86	2000	0.0341	0.1790	0.1134
0.0264	18.75	2100	0.0356	0.1864	0.1159
0.0271	19.64	2200	0.0341	0.1861	0.1150
0.0272	20.54	2300	0.0339	0.1945	0.1129
0.0278	21.43	2400	0.0343	0.1950	0.1131
0.0254	22.32	2500	0.0330	0.2015	0.1097
0.0204	23.21	2600	0.0326	0.1952	0.1069
0.0259	24.11	2700	0.0330	0.1976	0.1103
0.0325	25.0	2800	0.0328	0.1958	0.1088
0.0359	25.89	2900	0.0346	0.1908	0.1105
0.0265	26.79	3000	0.0337	0.1991	0.1096
0.0223	27.68	3100	0.0345	0.1948	0.1107
0.025	28.57	3200	0.0330	0.2046	0.1077
0.0242	29.46	3300	0.0335	0.2055	0.1072
0.0187	30.36	3400	0.0307	0.1980	0.1021
0.0219	31.25	3500	0.0322	0.1998	0.1054
0.0198	32.14	3600	0.0322	0.2104	0.1048
0.0181	33.04	3700	0.0325	0.2093	0.1050
0.0166	33.93	3800	0.0315	0.2120	0.1032
0.0212	34.82	3900	0.0300	0.2021	0.1003
0.0214	35.71	4000	0.0316	0.2045	0.1033
0.016	36.61	4100	0.0302	0.2022	0.1000
0.0169	37.5	4200	0.0299	0.2060	0.0996
0.0191	38.39	4300	0.0307	0.2114	0.1006
0.0218	39.29	4400	0.0314	0.2066	0.1015
0.0182	40.18	4500	0.0300	0.2054	0.0988
0.0185	41.07	4600	0.0303	0.2050	0.0994
0.0171	41.96	4700	0.0306	0.2136	0.0994
0.0171	42.86	4800	0.0318	0.2062	0.1007
0.0161	43.75	4900	0.0319	0.2101	0.1013
0.0168	44.64	5000	0.0306	0.2111	0.0985
0.015	45.54	5100	0.0318	0.2110	0.1003
0.0126	46.43	5200	0.0319	0.2086	0.0999
0.0153	47.32	5300	0.0310	0.2095	0.0981
0.0172	48.21	5400	0.0310	0.2130	0.0985
0.017	49.11	5500	0.0316	0.2137	0.0994
0.0152	50.0	5600	0.0316	0.2143	0.0995

Framework versions

Transformers 4.39.0.dev0
Pytorch 2.0.1+cu117
Datasets 2.8.0
Tokenizers 0.15.2

macabdul9
/

hubert-large-ll60k-librispeech-clean-100h-demo-dist

hubert-large-ll60k-librispeech-clean-100h-demo-dist

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results