The number of training data?

by JayKe700 - opened Feb 20, 2023

Feb 20, 2023

Hi alvanlii

I tried to reproduce your experimental results, and I found that the number of data mismatch. According to your epoch, training steps and batch size, it can be inferred that the total number of training sets you use is about 18w. And the total number of data I obtained using these three data(Common Voice 11 Canto Train Set, CantoMap, Cantonse-ASR) sets you listed is about 9.76w. I would like to ask if I have miss any information?
Looking forward to your reply.

Thanks
Jayke

alvanlii

Owner Feb 20, 2023

Hi Jayke, I doubled the training data and applied augmentation differently on the duplicated set. Sorry, should have been more clear about it.

JayKe700

Feb 20, 2023

Hi alvanlii，thanks for your reply, I got it.

alvanlii changed discussion status to closed Feb 22

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment