YC-Li's picture
Update README.md
520b962 verified
|
raw
history blame contribute delete
No virus
889 Bytes
metadata
language:
  - en
metrics:
  - wer
  - bleu
  - google_bleu
tags:
  - ASR
  - Error Correction
  - Crossmodal

Model Description

Pre-Training Settings:

166k samples from Common Voice 13.0 was recognized by Whisper tiny.en.

1,000 random samples was selected as the test set, and the rest for training and validation with an 80%-20% split

  • Batch size: 256

  • Initial learning rate: 1e-5

  • Adam optimizer

  • 30 epochs

  • Cross-entropy loss

  • Best checkpoint saved based on WER as the evaluation metric

  • Decoding is performed using beam search with a size of 5

  • S2S backbone model adopted from ''Exploring data augmentation for code generation tasks''.

Continue-Training Setting:

  • 2 epochs for gold-gold to prevent the over-correction problem on ''Ted talk data''