|
--- |
|
language: |
|
- en |
|
metrics: |
|
- wer |
|
- bleu |
|
- google_bleu |
|
tags: |
|
- ASR |
|
- Error Correction |
|
- Crossmodal |
|
--- |
|
|
|
### Model Description |
|
|
|
Pre-Training Settings: |
|
|
|
166k samples from Common Voice 13.0 was recognized by Whisper tiny.en. |
|
|
|
1,000 random samples was selected as the test set, and the rest for training and validation with an 80%-20% split |
|
|
|
- Batch size: 256 |
|
|
|
- Initial learning rate: 1e-5 |
|
|
|
- Adam optimizer |
|
|
|
- 30 epochs |
|
|
|
- Cross-entropy loss |
|
|
|
- Best checkpoint saved based on WER as the evaluation metric |
|
|
|
- Decoding is performed using beam search with a size of 5 |
|
|
|
- S2S backbone model adopted from ''[Exploring data augmentation for code generation tasks](https://aclanthology.org/2023.findings-eacl.114/)''. |
|
|
|
Continue-Training Setting: |
|
|
|
- 2 epochs for gold-gold to prevent the over-correction problem on ''[Ted talk data](https://cris.fbk.eu/bitstream/11582/104409/1/WIT3-EAMT2012.pdf)'' |
|
|