libri-alpha-0.75-Temp-1-attention-3-layers-distil-with-6-layers-loss-att-take-2

This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

Training Loss	Epoch	Step	Validation Loss	Wer
202.4293	0.45	200	26.7777	0.2779
197.6471	0.9	400	25.8300	0.2760
204.8931	1.35	600	25.6774	0.2747
193.3182	1.79	800	25.6049	0.2737
205.2241	2.24	1000	25.5552	0.2739
186.0407	2.69	1200	25.4364	0.2737
191.7055	3.14	1400	25.7949	0.2764
185.0721	3.59	1600	26.1202	0.2753
198.8579	4.04	1800	25.8496	0.2763
185.7877	4.48	2000	27.0753	0.2731
194.9394	4.93	2200	25.6920	0.2775
188.2296	5.38	2400	25.7362	0.2742
188.0202	5.83	2600	25.9170	0.2755
191.5541	6.28	2800	26.8590	0.2771
198.2817	6.73	3000	26.4101	0.2791