YC-Li commited on
Commit
520b962
1 Parent(s): ca94c75

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +30 -1
README.md CHANGED
@@ -4,8 +4,37 @@ language:
4
  metrics:
5
  - wer
6
  - bleu
 
7
  tags:
8
  - ASR
9
  - Error Correction
10
  - Crossmodal
11
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  metrics:
5
  - wer
6
  - bleu
7
+ - google_bleu
8
  tags:
9
  - ASR
10
  - Error Correction
11
  - Crossmodal
12
+ ---
13
+
14
+ ### Model Description
15
+
16
+ Pre-Training Settings:
17
+
18
+ 166k samples from Common Voice 13.0 was recognized by Whisper tiny.en.
19
+
20
+ 1,000 random samples was selected as the test set, and the rest for training and validation with an 80%-20% split
21
+
22
+ - Batch size: 256
23
+
24
+ - Initial learning rate: 1e-5
25
+
26
+ - Adam optimizer
27
+
28
+ - 30 epochs
29
+
30
+ - Cross-entropy loss
31
+
32
+ - Best checkpoint saved based on WER as the evaluation metric
33
+
34
+ - Decoding is performed using beam search with a size of 5
35
+
36
+ - S2S backbone model adopted from ''[Exploring data augmentation for code generation tasks](https://aclanthology.org/2023.findings-eacl.114/)''.
37
+
38
+ Continue-Training Setting:
39
+
40
+ - 2 epochs for gold-gold to prevent the over-correction problem on ''[Ted talk data](https://cris.fbk.eu/bitstream/11582/104409/1/WIT3-EAMT2012.pdf)''