--- language: - en metrics: - wer - bleu - google_bleu tags: - ASR - Error Correction - Crossmodal --- ### Model Description Pre-Training Settings: 166k samples from Common Voice 13.0 was recognized by Whisper tiny.en. 1,000 random samples was selected as the test set, and the rest for training and validation with an 80%-20% split - Batch size: 256 - Initial learning rate: 1e-5 - Adam optimizer - 30 epochs - Cross-entropy loss - Best checkpoint saved based on WER as the evaluation metric - Decoding is performed using beam search with a size of 5 - S2S backbone model adopted from ''[Exploring data augmentation for code generation tasks](https://aclanthology.org/2023.findings-eacl.114/)''. Continue-Training Setting: - 2 epochs for gold-gold to prevent the over-correction problem on ''[Ted talk data](https://cris.fbk.eu/bitstream/11582/104409/1/WIT3-EAMT2012.pdf)''