Training dataset for the N-gram LM

#3
by kamil - opened

Hello!

Thank you for your amazing contribution. Can you please tell me which dataset did you use for the LM training ?

Was it just a merge of all the sentences you used for the W2V2 fine-tuning?

Thank's

Hi @kamil ,

Thank you for your interest!

Yes, I only used the sentences from the ASR training set for N-gram. However, to improve performance, you can collect more data and combine it with the existing set.

bofenghuang changed discussion status to closed

Sign up or log in to comment